Spectral modular arithmetic method and apparatus

ABSTRACT

A new hardware architecture is disclosed that performs the modular exponentiation operation, i.e., the computation of c=m e  mod n where c, m, e, n are large integers. The modular exponentiation operation is the most common operation in public-key cryptography. The new method, named the Spectral Modular Exponentiation method, uses the Discrete Fourier Transform over a finite ring, and relies on new techniques to perform the modular multiplication and reduction operations. The method yields an efficient and highly parallel architecture for hardware implementations of public-key cryptosystems which use the modular exponentiation operation as the basic step, such as the RSA and Diffie-Hellman algorithms. The method is extended to perform the multiplication operation in extension fields which is necessary to perform exponentiation or various other operations over these extension fields.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. provisional applications60/676,108, filed Apr. 28, 2005 and 60/731,142, filed Oct. 28, 2005,which are incorporated herein by reference.

FIELD

The disclosure pertains to cryptographic methods and apparatus.

BACKGROUND

Modular multiplication and modular exponentiation are importantoperations in many cryptographic systems. Modular multiplicationinvolves finding a product c=ab and then dividing the product c by amodulus M to find a remainder that is referred to a modular product. Theresult of modular multiplication of a and b performed modulo-M isgenerally written as c≡ab mod M. The modular multiplication operation isalso used to perform modular exponentiation.

Modular multiplication and exponentiation are used in the Diffie-Hellmanand RSA public-key cryptosystems, described in, for example, W. Diffieand M. E. Hellman, “New Directions in Cryptography,” IEEE Trans. onInformation Theory, vol. 22, pp. 644-654 (1976), and R. L. Rivest, A.Shamir, and L. Adelman, “A Method for Obtaining Digital Signatures andPublic-key Cryptosystems,” Communications of the ACM, vol. 21, pp.120-126 (1978). Modular multiplication is also used in elliptic keycryptography over the finite field GF(2^(k)) and in discreteexponentiation over GF(2^(k)). These applications are described in Q. K.Koç and T. Acar, “Fast Software Exponentiation in GF(2^(k)),” in T.Lang, J.-M. Muller, and N. Takagi, eds., Proceedings, 13th Symposium onComputer Arithmetic, pp. 225-231 (Asilomar, California, Jul. 6-9, 1997).The manipulation of the very large numbers used in these and othercryptographic systems can require complex systems to obtain satisfactoryprocessing speeds. Accordingly, improved methods and apparatus formodular multiplication and exponentiation are needed.

SUMMARY

We describe a new hardware architecture to perform the modularexponentiation operation, i.e., the computation of c=m^(e) mod n wherec, m, e, n are large integers. The modular exponentiation operation isthe most common operation in public-key cryptography. The new method,named the Spectral Modular Exponentiation method, uses the DiscreteFourier Transform over a finite ring, and relies on new techniques toperform the modular multiplication and reduction operations. The methodyields an efficient and highly parallel architecture for hardwareimplementations of public-key cryptosystems which use the modularexponentiation operation as the basic step, such as the RSA andDiffie-Hellman algorithms. The method with small modifications can alsobe extended to perform the multiplication operation in extension fieldswhich is necessary to perform exponentiation operation in groupstructures defined over these extensions, e.g. elliptic curve groups orJacobians of hyperelliptic curves. We further describe the ellipticcurve point multiplication operation in elliptic curves defined over theprime and extension Galois fields.

The foregoing and other objects, features, and advantages of thedisclosed technology will become more apparent from the followingdetailed description, which proceeds with reference to the accompanyingfigures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1.a-1.b illustrate modular operations.

FIG. 2 illustrates Spectral Modular Reduction (SMR).

FIG. 3 illustrates the flow diagram of SMR.

FIG. 4 illustrates the impact of Spectral Modular Exponentiation (SME).

FIG. 5 illustrates the flow diagram of Spectral Modular Multiplication(SMM).

FIG. 6 illustrates the flow of Spectral Modular Exponentiation (SME).

FIG. 7 illustrates a hardware architecture for SMM.

FIG. 8 is a representative architecture for step 1 of SMM, SMM_(—)2^(k)and SMM_q^(k)

FIG. 9 illustrates partial interpolation for SMM_(—)2^(k).

FIG. 10 illustrates parameter generation logic (PGL) for SMM_(—)2^(k).

FIG. 11 illustrates a representative processing unit for SMM_(—)2^(k).

FIG. 12 illustrates squaring used in Step 1 in a Finite Field with ONB.

FIG. 13 illustrates partial interpolation for SMM and SMM_q^(k).

FIG. 14 illustrates PGL for SMM_q^(k).

FIG. 15 illustrates a representative processing unit for SMM_q^(k).

FIG. 16 illustrates PGL for SMM.

FIG. 17 illustrates a representative processing unit for SMM.

DETAILED DESCRIPTION

As used in this application and in the claims, the singular forms “a,”“an,” and “the” include the plural forms unless the context clearlydictates otherwise. Additionally, the term “includes” means “comprises.”Disclosed below are representative embodiments of cryptographic methods,systems, and apparatus that can be configured for encryption,decryption, and other operations. While particular examples andapplications for the disclosed embodiments are also disclosed, thedescribed systems, methods, and apparatus should not be construed aslimiting in any way. Instead, the present disclosure is directed towardall novel and nonobvious features, aspects, and equivalents of thevarious disclosed embodiments, alone and in various combinations andsub-combinations with one another. The disclosed systems, methods, andapparatus are not limited to any specific aspect, feature, orcombination thereof, nor do the disclosed systems, methods, andapparatus require that any one or more specific advantages be present orproblems be solved. In addition, the attached figures may not show thevarious ways in which the disclosed systems, methods, and apparatus canbe used in conjunction with other systems, methods, and apparatus, andcomponents that are well known in the art are not described in detail.

1 INTRODUCTION

1.1 Approach

The exponentiation operation is the most common operation in public-keycryptography. The Spectral Modular Exponentiation (SME) method is a newmethod for modular exponentiation of large integers. It takes theintegers m, e, n as input, and computes c as the output such thatc=m ^(e)(mod n).

Clearly, a modular exponentiation computation consists of severalmodular multiplications with respect to the same modulus. A simple modelfor a modular multiplication, is illustrated in FIG. 1.a; the inputs arefirst multiplied and then reduced with respect to the same modulus.Therefore, it is possible to visualize the exponentiation as a spiral ofmodular multiplications as seen in FIG. 1.b. There exist many methodsfor computing modular exponentiation, varying according to the choicesof integer multiplication and modular reduction; for instance Karatsubamultiplication followed by a Montgomery reduction is one of the mostpopular approach. Some interleaved methods are also widely accepted buthas less important for our discussion.

Recall that spectral techniques give the asymptotically fastest knownmethod (i.e. Schönhage-Strassen multiplication [7]) of integermultiplication. When it comes to the utilization of spectral techniquesfor modular operations, the current approach is summarized as “performthe integer multiplication in the frequency domain (favorablyFFT+convolution) followed by a modular reduction in time domain” (seeFIG. 2). If a modular exponentiation computation is considered, becauseof the forward and backward transformation calculations, spectralmethods becomes inefficient for cryptographic sizes. In fact, thismethod is never used in practice.

We proposed a method of reduction in the frequency domain which replacesthe order of DFT inverse and modular reduction seen in the right-handside of FIG. 2. We named this reduction as Spectral Modular reduction(SMR), the flow diagram of SMR is seen in FIG. 3. The impact of such anapproach can be seen by unfolding the exponentiation spiral with the newsetting. This gives a sequence of operations in which DFT and theinverse of DFT are applied consecutively. Consecutive computation ofthese two transforms is redundant as illustrated in FIG. 4. In otherwords, the costly forward and backward transformation calculations donot need to be performed for every multiplication carried during theexponentiation. Consequently, this new approach decreases the asymptoticcrossovers of the spectral methods to cryptographic sizes whilecomputing the modular exponentiation.

We named this method of modular exponentiation as the Spectral ModularExponentiation (SME). As we work in the frequency domain at all times,we further named the modular multiplication carried in the frequency asSpectral Modular Multiplication (SMM) which is also the core operationfor SME. In FIGS. 5 and 6 we give the flow diagram of SMM and SMErespectively.

1.2 Outline

In Section 2 we describe the SME and SMM as a method of modularexponentiation and modular multiplication of large integers. Since bothSME and SMM describes operations over integer rings these algorithms arereadily usable for cryptosystems such as RSA, DSS and ElGamal. Later, inSection 4, we give a detailed description of how the SMM core can beused for elliptic curve point multiplication operation over primefields. Therefore, our methods are applicable to ECC over prime fieldswith large characteristics.

In Section 3, we introduce how these ideas on can be extended to themultiplicative groups of finite field extensions. Because of securityconcerns, exponentiation operation in extension fields is not popular inuse, hence, in this section only the SMM core for extension fields ispresented. Again, in Section 4, this SMM core will be the basis of theelliptic curve point multiplication operation over extension fields.

Since the material presented in 2 covers the use of spectral methods forprime finite fields. In Section 3 we particularly concentrate on twotypes of fields namely;

-   -   extension fields: GF(q) with q=p^(k), p a small odd special        prime,    -   binary extension fields: GF(2^(k)) thus with q=2^(k) for some        k≧1.

In Section 5, we turn our attention to the architectural discussions.The spectral algorithms yield efficient and highly parallelarchitectures for hardware implementations of public-key cryptosystems.We describe detailed architectures for SMM cores tailored for prime andextension fields in Sections 2 and 3.

2 SPECTRAL ARITHMETIC OVER RING OF INTEGERS

2.1 Basic Notation

While computing c=m^(e) mod n, the exponent e can be of any length, andwe assume that it is a j-bit integer such that j≧128. On the other hand,the integers c, m, n are k-bit integers with k≧512. It is assumed thatk=su for some integers s and u, where u is the wordsize of theimplementation, i.e., the number of bits in a single word. A k-bitnumber is viewed as a vector of s words such that each word is u bits.Let a_(i) be the ith word of the integer a, then the vectorrepresentation means that we express the integer in the radix b=2^(u) asa=(a _(s-1) a _(s-2) . . . a ₁ a ₀)_(b)where 0≦a_(i)≦b−1. This representation implies that

$a = {{\sum\limits_{i = 0}^{s - 1}\;{a_{i}b^{i}}} = {{\sum\limits_{i = 0}^{s - 1}\;{a_{i}2^{u\; i}}} = {a_{0} + {a_{1}2^{u}} + {a_{2}2^{2\; u}} + \ldots + {a_{s - 1}{2^{{({s - 1})}\; u}.}}}}}$We will also treat integers as polynomials with an indeterminate t. Thevalue of the integer is obtained from its polynomial by evaluating it att=b. The above integer can be written in the polynomial notation as

${a(t)} = {{\sum\limits_{i = 0}^{s - 1}\;{a_{i}t^{i}}} = {a_{0} + {a_{1}t} + {a_{2}t^{2}} + \ldots + {a_{s - 1}{t^{({s - 1})}.}}}}$We perform our arithmetic in the finite integer ring Z_(q). If q is aprime, then the underlying structure will be a finite field. Thearithmetic of this ring or field is simply modulo q arithmetic. Anotherimportant assumption we make in the SME method is that a d-point DFTexists over the ring Z_(q). This assumption requires that

-   -   The multiplicative inverse of d exists in Z_(q), which requires        gcd (d,q)=1.    -   A principal dth root of unity exists in Z_(q), which requires        that d divides p−1 for every prime p divisor of q. We will        denote the principal dth root of unity with w.        We will use the notation        A(t)=DFT _(d) ^(w) [a(t)]        to denote the d-point DFT using the dth root of w, transforming        the time-domain polynomial a(t) to the spectral-domain        polynomial A(t). The d-point DFT is performed in the finite ring        Z_(q). Since q, d, and w are fixed, we will also use the simpler        notation A(t)=DFT[a(t)] to denote the DFT operation. If a has s        words in it, then, its polynomial will also have s words. In        general, the value of d is fixed by the ring, and the value of s        is equal to s=┌d/2┐. The empty places in the polynomial a(t),        which are the coefficients of the higher orders of t, are filled        with zeros in a(t) before applying the d-point DFT. Inevitably,        we also need the inverse DFT function, which we denote using the        notation        a(t)=IDFT _(d) ^(w) [A(t)]        or simply as a(t)=IDFT[A(t)]. We also make use of the following        vector in our operations        Γ=[1, w ⁻¹ , w ⁻² , . . . , w ^(−(d-1))].        The polynomial representation of Γ is given as        Γ(t)=1+w ⁻¹ t+w ⁻² t ² + . . . +w ^(−(d-1)) t ^(d-1).        The modulus n is a special number, for example, it is the        product of two primes in RSA, and a prime number in        Diffie-Hellman. Our assumption is that n is always odd. Let n₀        denote the least significant word of n, i.e., n₀=n (mod b),        which is also an odd integer. Let v₀ be the multiplicative        inverse of n₀ modulo b, i.e.,        v ₀ =n ₀ ⁻¹(mod b).        We need to use an integer multiple of n in our method, which we        denote by θ, and is derived from n such that        θ=v ₀ n.        Note that since v₀n₀=1 (mod b), the least significant word of θ        is equal to 1, i.e., θ₀=1. The inverse v₀=n₀ ⁻¹ (mod b) can be        computed using the extended Euclidean algorithm, and we will        give an explicit algorithm to compute it in this paper.

Let λ=b^(d) (mod n). We name this integer as the Montgomery coefficient,since it is used inside the Montgomery multiplication algorithm. Thenumber b^(d) is a 2s-word number, however, λ is an s-word number sinceλ=b^(d) (mod n). We will also use the polynomial representation of λ,which is denoted by λ(t). Furthermore, we also define δ as the square ofλ modulo n. The number δ is also an s-word number since δ=λ² (mod n). Wewill also the polynomial representation of δ, which is denoted by δ(t).

In addition to the usual scalar multiplication, we will also utilize thecomponentwise multiplication of vectors in the ring Z_(q), and denotethis operation with the symbol ⊙ asc(t)=a(t)⊙b(t).Given the vectors a(t)=(a₀, a₁, . . . , a_(d-1)) and b(t)=(b₀, b₁, . . ., b_(d-1)), the resulting vector after the ⊙ operation will be c(t)=(c₀,c₁, . . . , c_(d-1)) such that c_(i)=a_(i)b_(i) (mod q) for i=0, 1, . .. , d−1.

TABLE 1 The symbols used in the SME method. Symbol Meaning Relationshipc, m, e, n Output and input integers c = m^(e) (mod n) k Number of bitsin c, m, n usually k ≧ 512 u Length of a single word k is a multiple ofu s Number of words in c, m, n k = s^(u) b Radix of representation b =2^(u) a(t) Polynomial representation of a a(b) = a d Length of the DFT s= ┌d/2┐ Z_(q) Ring of integers modulo q gcd (d, q) = 1 p A prime whichdivides q d divides p − 1 for every p w Principal dth root of unity inZ_(q) w^(d) = 1 (mod q) DFT_(d) ^(w)[a(t)] d-point DFT of a(t) in Z_(q)A(t) = DFT[a(t)] IDFT_(d) ^(w)[A(t)] Inverse DFT function a(t) =IDFT[A(t)] Γ Γ = [1, w⁻¹, w⁻², . . . , w^(−(d−1))] w is dth root ofunity n Modulus n is odd n₀ Least significant word of n n₀ = n (mod b)v₀ Inverse of n₀ modulo b v₀ = n₀ ⁻¹ (mod b) θ v₀ multiple of n θ = v₀nand θ₀ = 1 λ Montgomery coefficient λ = b^(d) (mod n) δ Square of λmodulo n δ = λ² (mod n) ⊙ Componentwise c(t) = a(t) ⊙ b(t)multiplication in Z_(q)2.2 Spectral Modular Exponentiation MethodThe Spectral Modular Exponentiation (SME) method relies on the SpectralModular Multiplication (SMM) method, which is described in detailed inthe following section. It also relies on the d-point Discrete FourierTransform function in Z_(q) and the function DFT_(d) ^(w)[·] needs to beavailable. However, the DFT function is used only 5 times in the SMEmethod, and thus, the efficiency of the DFT implementation is not verycrucial for the efficiency of the SME method.

Furthermore, any addition chain (exponentiation) method can be used inthe SME method. We illustrate it using the binary method, however, moreadvanced algorithms such as the m-ary method, sliding windows method,etc., can also be utilized.

-   Input: The inputs are m, e, and n such that m and n are s words and    the exponent e is j bits.-   Preprocessing: It is often the case that the modulus n is available    before the message m, for example, this is true for the digital    signature algorithms. Therefore, we break the preprocessing into two    stages as follows.-   Preprocessing with n: Given n, obtain n₀.    -   1. Compute v₀=n₀ ⁻¹ (mod 2^(u)) using the extended Euclidean        algorithm as follows:        v ₀=1        -   for i=2 to u            -   if n₀v₀≧2^(i-1) mod 2^(i) then v₀=v₀+2^(i-1)        -   return v₀    -   2. Compute θ=v₀n and obtain the polynomial θ(t). Note that the        least significant word of θ is 1, i.e., θ₀=1.    -   3. Compute Θ(t)=DFT[θ(t)] using the DFT function. An efficient        implementation of the DFT function, i.e., the FFT (Fast Fourier        Transform) algorithm, may be desired. However, simpler DFT        implementations, for example, the matrix-vector product        implementation can also be utilized. The DFT and the inverse DFT        functions require d and w. Additionally, d⁻¹ mod q is        precomputed and saved.    -   4. Compute the s-word integers λ=b^(d) (mod n) and δ=λ² (mod n)        using integer division. Also obtain the d-word polynomial        representation δ(t) of δ. Again the higher order words are        filled with zeros to make the polynomial d words. We then use        the DFT function to compute Δ(t)=DFT[δ(t)].    -   5. Assign x=1 and obtain the polynomial representation of x(t)        which is equal to x(t)=1. Compute the polynomial X(t)=DFT[x(t)].        This step is simplified using the fact that x(t)=1, and thus, we        do not need to use the DFT function to compute X(t). It is        obtained using assignment as X(t)=1+1·t+ . . . +1·t^(d-1).-   Preprocessing with m: Given m, obtain its polynomial representation    m(t).    -   1. Compute the polynomial M(t)=DFT[m(t)].    -   2. Assign C(t)=X(t). Recall that x(t)=1 and X(t)=DFT[x(t)]=[1,        1, . . . , 1].    -   3. Use the Spectral Modular Multiplication (SMM) method to        multiply M(t) and Δ(t) in order to obtain M(t). We will denote        this step by        M (t)=SMM[M(t),Δ(t)].

The definition and detailed steps of the SMM method are given in thenext section.

-   -   4. Use the SMM method to multiply C(t) and Δ(t) in order to        obtain C(t). We will denote this step by        C (t)=SMM[C(t)Δ(t)].

-   Exponentiation Loop: The exponentiation operation is performed as    soon as the j-bit exponent e is available. Let the binary expansion    of e be (e_(j-1) e_(j-2) . . . e₁e₀)₂. The exponentiation operation    needs C(t) and M(t) as input in addition to the exponent e.    -   for i=j−1 downto 0        C (t)=SMM[ C (t), C (t)]        -   if e_(i)=1 then C(t)=SMM[ C(t), M(t)]

-   Postprocessing: After the exponentiation loop is completed, we will    have a final value of C(t).

This vector will now be brought back to the time domain as follows.

-   -   1. Obtain C(t) using the SMM method by multiplying C(t) and X(t)        as follows        C(t)=SMM[ C (t),X(t)].    -   2. Obtain c(t) using the Inverse DFT function as follows        c(t)=IDFT[C(t)].

-   Output: The integer c representing the polynomial c(t) is the output    such that c=m^(e) (mod n).    2.3 Spectral Modular Multiplication Method    The SMM method takes two arguments, such as A(t) and B(t), and    computes S(t) as the output. We will denote the operation using    R(t)=SMM[A(t),B(t)].    The steps of the SMM method are given below.    -   1: R(t)=A(t)⊙B(t)(mod q)    -   2: α=0    -   3: for i=0 to d−1    -   4: r₀=d⁻¹(R₀+R₁+ . . . +R_(d-1))(mod q)    -   5: β=−(r₀+α)(mod b)    -   6: α=(r₀+α+β)/b    -   7: R(t)=R(t)+β·Θ(t)(mod q)    -   8: R(t)=R(t)−(r₀+β)(mod q)    -   9: R(t)=R(t)⊙Γ(t)(mod q)    -   10: end for    -   11: return R(t)        The steps of the SMM method are explained in detail below. In        addition to the inputs, the SMM function has access to        parameters available after the preprocessing steps of the SME        method. These parameters are d⁻¹, ΘM, and Γ(t). The SME method        performs only modulo q and modulo b operations.    -   1. This is the vector ⊙ operation in the ring Z_(q). Given the        vectors A(t) and B(t), we compute R(t) such that        R_(i)=A_(i)B_(i) (mod q) for i=0, 1, . . . , d−1.    -   2. Initial value of α is assigned as zero.    -   3. The for loop is executed d times for i=0, 1, . . . , d−1.    -   4. After step 1, we have the vector R(t). The elements are        summed modulo q to obtain        R ₀ +R ₁ + . . . +R _(d-1)(mod q),        -   and multiplied by d⁻¹ mod q in order to obtain r₀. The            inverse d⁻¹ was already computed in the preprocessing stage            of the SME method.    -   5. Since b=2^(u), the computation of β=−(r₀+α)mod b is a 1-word        operation, involving an addition and sign-change (2's        complement) operations on 1-word numbers.    -   6. The value of α is updated for the next loop instance as        α=(r₀+α+β)/b.    -   7. The 1-word number β is multiplied with every element of Θ(t)        and the result is added to the corresponding element of R(t).        Given N(t)=(N₀, N₁, . . . , N_(d-1)), we compute the elements of        the new R(t) as        R _(i) =R _(i)+β·Θ_(i)(mod q)        -   for i=0, 1, . . . , d−1.    -   8. This is a vector operation. The 1-word number r₀+β is        subtracted from every element of R(t), and therefore, we have        R _(i) =R _(i)−(r ₀+β)(mod q)        -   for i=0, 1, . . . , d−1.    -   9. This is also a vector operation. The resulting R(t) from the        previous step is multiplied componentwise with the vector Γ(t).        Therefore, the elements of the new R(t) are found as        R _(i) =R _(i)·Γ_(i)(mod q)        -   for i=0, 1, . . . , d−1.    -   10. The end of for loop.    -   11. The multiplication result R(t)=SMM[A(t),B(t)] is returned.        2.4 An Example Exponentiation Using the SME Method        In this section, we give an example exponentiation computation        using the SME method with the input values as m=27182, e=53, and        n=31417. We will describe the steps of the SME method performing        this modular exponentiation operation, giving the temporary        results and the final result c=m^(e) (mod n).

We select the length of a single word as u=4. Therefore, the radix ofthe representation is b=2^(u)=16, i.e., the numbers are represented inhexadecimal. Sincen=(31417)₁₀=(0111 1010 1011 1001)₂=(7AB9)₁₆,we have k=16 and s=4. The polynomial representation of n is given asn(t)=9+11t+10t ²+7t ³,in which the digits are expressed in decimal. The integer valuen=(31417)₁₀=(7AB9)₁₆ is obtained by evaluating n(t) at t=16 in anyselected basis.

We will perform our computations in the Fermat ring Z_(q) where q=2²⁰+1.Since s=4, we need a DFT function in this ring with the length d=8. Itturns out that such DFT exists in this ring with the principal 8th rootof unity given as w=32. Furthermore, the vector Γ(t) is given as

$\begin{matrix}{{\Gamma(t)} = {1 + {w^{- 1}t} + {w^{- 2}t^{2}} + {w^{- 3}t^{3}} + {w^{- 4}t^{4}} + {w^{- 5}t^{5}} + {w^{- 6}t^{6}} + {w^{- 7}t^{7}}}} \\{= {1 + {1015809\mspace{11mu} t} + {1047553\mspace{11mu} t^{2}} + {1048545\mspace{11mu} t^{3}} + {1048576\mspace{11mu} t^{4}} +}} \\{{32768\mspace{11mu} t^{5}} + {1024\mspace{11mu} t^{6}} + {32\mspace{11mu}{t^{7}.}}}\end{matrix}$The steps of the SME method computing this modular exponentiation aredescribed below.

-   Preprocessing with n: Given n=(7AB9)₁₆, we have n₀=9.    -   1. The inversion algorithm computes v₀=9⁻¹ (mod 16) as v₀=9.    -   2. The computation of θ=v₀n gives 9·31417=282753, which is        expressed in binary and in hexadecimal as        θ=(0100 0101 0000 1000 0001)₂=(45081)₁₆.        -   Recall that θ is s+1=5 words and θ₀=1. We also obtain the            polynomial θ(t) as            θ(t)=1+8t+5t ³+4t ⁴.    -   3. The computation of Θ(t)=DFT[θ(t)] is accomplished using the        DFT function. In Section 4, we describe the DFT computation and        particularly evaluate this transform. We obtain the result of        the DFT as        Θ(t)=18+164093t+3077t ²+262301t ³+1048569t ⁴+884478t ⁵+1045510t        ⁶+786270t ⁷.        -   Recall that we work in the finite ring Z_(q) for            q=²²+1=1048577, and thus, the coefficients of the polynomial            Θ(t) are in the range [0, 2²⁰+1).        -   In this step, we also compute and save d⁻¹ (mod q) as            d ⁻¹=8⁻¹(mod 2²⁰+1)=917505.    -   4. We compute the Montgomery coefficient and its square as        λ=16⁸(mod 31417)=12060,        δ=12060²(mod 31417)=14307.        -   The polynomial representation of δ is found using            δ=(14307)₁₀=(37E3)₁₆ as            δ(t)=3+14t+7t ²+3t ³.        -   Furthermore, we obtain the spectral representation of δ(t)            using the DFT as            Δ(t)=27+105923t+11260t ²+451683t ³1048570t ⁴+956996t            ⁵+1037309t ⁶+582564t ⁷.    -   5. Given x(t)=1, the spectral representation of x(t) is found as        X(t)=1+t+t ² +t ³ +t ⁴ +t ⁵ +t ⁶ +t ⁷.-   Preprocessing with m: Given m=(27182)₁₀=(6A2E)₁₆, we have    m(t)=14+2t+10t²+6t³.    -   1. Given m(t), we obtain its spectral representation M(t) using        the DFT as        M(t)=32+206926t+1044485t ²+55502t ³+16t ⁴862159t ⁵+4100t        ⁶+972623t ⁷.    -   2. The initial value of C(t) is given as        C(t)=X(t)=1+t+t ² +t ³ +t ⁴ +t ⁵ +t ⁶ +t ⁷.    -   3. The SMM method is used to compute M(t)=SMM[M(t),Δ(t)] with        inputs        M(t)=32+206926t+1044485t ²+55502t ³+16t ⁴+862159t ⁵+4100t        ⁶+972623t ⁷,        Δ(t)=27+105923t+11260t ²+451683t ³+1048570t ⁴+956996t ⁵+1037309t        ⁶+582564t ⁷.        -   We then use the SMM method to find the resulting polynomial            M(t) given the inputs M(t) and Δ(t). First we execute Step 1            in the SMM method, and obtain the initial value of R(t)            using the rule R_(i)=M_(i)·Δ_(i) (mod q) for i=0, 1, . . . ,            7 as            R(t)=864+866244t+61468t ²+979527t ³+1048465t ⁴+464721t            ⁵+987165t ⁶+834767t ⁷.        -   In Step 2 of the SMM method, We assign the initial value of            α=0, and start the for loop for i=0, 1, . . . , 7. We            illustrate the computation of the instance of the loop for            i=0 in Table 2. The for loop needs to execute for the            remaining values of i as i=1, 2, . . . , 7 in order to            compute the resulting product M(t) which is found as            M (t)=354+463771t+11385t ²+686651t ³+156t ⁴+722398t            ⁵+1037434t ⁶+225086t ⁷.

TABLE 2 The SME method for loop instance i = 0. Step Operation andResult 4: r₀ = d⁻¹ · (R₀ + R₁ + R₂ + R₃ + R₄ + R₅ + R₆ + R₇) (mod q) r₀= 917505 · (864 + 866244 + 61468 + 979527 + 104846 + 464721 + 987165 +834767) (mod 1048567) = 42 5: β = −(r0 + α) (mod b) = −(42 + 0) (mod 16)= 6 6: α = (r₀ + 0 + β)/b = (42 + 6)/16 = 3 7: R_(i) = R_(i) + β · Θ_(i)(mod q) R(t) = 972 + 802225t + 79930t² + 456179t³ + 1048417t⁴ +528704t⁵ + 968763t⁶ + 309502t⁷ 8: R_(i) = R_(i) − (r₀ + β) = R_(i) − 48(mod q) R(t) = 924 + 802177t + 79882t² + 456131t³ + 1048369t⁴ +528656t⁵ + 968715t⁶ + 309454t⁷ 9: R_(i) = R_(i), Γ_(i) (mod q) R(t) =924 + 802177t + 79882t² + 456131t³ + 1048369t⁴ + 528656t⁵ + 968715t⁶ +599495t⁷

-   -   4. In this step, the SMM method is used to compute        C(t)=SMM[C(t),Δ(t)] with inputs        C(t)=1+t+t ² +t ³ +t ⁴ +t ⁵ +t ⁶ +t ⁷,        Δ(t)=27+105923t+11260t ²+451683t ³+1048570t ⁴+956996t ⁵+1037309t        ⁶+582564t ⁷.        -   We will not give the details of this multiplication since it            is similar to the previous one. The result is obtained as            C (t)=342+472129t+17495t ²+875041t ³+132t ⁴+730372t            ⁵+1031256t ⁶+20260t ⁷.

-   Exponentiation Loop: The loop starts with the values of M(t) and    C(t) computed above as    M (t)=354+463771t+11385t ²+686651t ³+156t ⁴+722398t ⁵+1037434t    ⁶+225086t ⁷.    C (t)=342+472129t+17495t ²+875041t ³+132t ⁴+730372t ⁵+1031256t    ⁶+20260t ⁷.    -   Given the exponent value e=(53)₁₀=(110101)₂, the exponentiation        algorithm performs squarings and multiplications using the SMM        method. Since j=6, the value of i starts from i=5 and moves down        to zero, and computes the new value of C(t) using the binary        method of exponentiation as described. The steps of the        exponentiation and intermediate values of C(t) are tabulated in        Table 3. The final value is computed as        C (t)=123+34099t+40979t ²+229426t ³+43t ⁴+31539t ⁵+1007636t        ⁶+753717t ⁷.

TABLE 3 The steps of the exponentiation loop. i e_(i) Operation C(t)Start 342 + 472129t + 17495t² + 875041t³ + 132t⁴ +730372t⁵ + 1031256t⁶ +20260t⁷ 5 C(t) = SMM[ C(t), C(t)] 270 + 44476t + 108628t² + 286841t³ +58t⁴ + 37692t⁵ + 940117t⁶ + 680064t⁷ 1 C(t) = SMM[ C(t), M(t)] 288 +741281t + 71696t² + 769759t³ + 68t⁴ + 473378t⁵ + 976913t⁶ + 113124t⁷ 4C(t) = SMM[ C(t), C(t)] 348 + 869774t + 81984t² + 183179t³ + 92t⁴ +338831t⁵ + 966721t⁶ + 705938t⁷ 1 C(t) = SMM[ C(t), M(t)] 297 + 42796t +20613t² + 615596t³ + 129t⁴ + 39470t⁵ + 1028230t⁶ + 351407t⁷ 3 C(t) =SMM[ C(t), C(t)] 123 + 34099t + 40979t² + 229426t³ + 43t⁴ + 31539t⁵ +1007636t⁶ + 753717t⁷ 0 123 + 34099t + 40979t² + 229426t³ + 43t⁴ +31539t⁵ + 1007636t⁶ + 753717t⁷ 2 C(t) = SMM[ C(t), C(t)] 336 + 857269t +74835t² + 1014675t³ + 94t⁴ + 326774t⁵ + 973908t⁶ + 947609t⁷ 1 C(t) =SMM[ C(t), M(t)] 183 + 162592t + 51267t² + 691423t³ + 67t⁴ + 945569t⁵ +997444t⁶ + 297954t⁷ 1 C(t) = SMM[ C(t), C(t)] 348 + 869774t + 81984t² +183179t³ + 92t⁴ + 338831t⁵ + 966721t⁶ + 705938t⁷ 0 348 + 869774t +81984t² + 183179t³ + 92t⁴ + 338831t⁵ + 966721t⁶ + 705938t⁷ 0 C(t) = SMM[C(t), C(t)] 297 +42796t + 20613t² + 615596t³ + 129t⁴ + 39470t⁵ +1028230t⁶ + 351407t⁷ 1 C(t) = SMM[ C(t), M(t)] 123 + 34099t + 40979t² +229426t³ + 43t⁴ + 31539t⁵ + 1007636t⁶ + 753717t⁷

-   Postprocessing: After the exponentiation loop is completed, we have    the final value C(t). In this step, we have two consecutive SMM    executions.    -   We obtain C(t) using C(t)=SMM[ C(t),X(t)] using the inputs        C (t)=123+34099t+40979t ²+229426t ³+43t ⁴+31539t ⁵+1007636t        ⁶+753717t ⁷.        X(t)=1+t+t ² +t ³ +t ⁴ +t ⁵ +t ⁶ +t ⁷.    -   This computation finds C(t) as        C(t)=312+76043t+36932t ²+58506t ³+112t ⁴+71693t ⁵+1011781t        ⁶+842895t ⁷,        -   We obtain c(t) using the inverse DFT function            c(t)=IDFT[C(t)], which gives            c(t)=9+4t+3t ²+6t ³.        -   Thus, we obtain the final value as c=(6349)₁₆=(25417)₁₀,            which is equal to            25417=227182⁵³(mod 31417)        -   as required.            2.5 Computation of Discrete Fourier Transform in Z_(q)            The DFT of a sequence a=[a₀, a₁, . . . , a_(d-1)] is defined            as the sequence A=[A₀, A_(i), . . . , A_(d-1)] such that

${A_{j} = {\sum\limits_{i = 0}^{d - 1}\;{a_{i}w_{d}^{i\; j}\mspace{14mu}\left( {{mod}\mspace{14mu} q} \right)}}},$where w is the dth root of unity. The DFT function is normally used overthe field of complex numbers C, however, in our application, we need afinite ring or field since we cannot perform infinite precisionarithmetic.

The above sum can also be written as a matrix-vector product as

$\begin{bmatrix}A_{0} \\A_{1} \\A_{2} \\\vdots \\A_{d - 1}\end{bmatrix} = {{\begin{bmatrix}1 & 1 & 1 & \cdots & 1 \\1 & w & w^{2} & \cdots & w^{d - 1} \\1 & w^{2} & w^{4} & \cdots & w^{2{({d - 1})}} \\\vdots & \vdots & \vdots & \vdots & \vdots \\1 & w^{d - 1} & w^{2{({d - 1})}} & \cdots & w^{{({d - 1})}{({d - 1})}}\end{bmatrix}\begin{bmatrix}a_{0} \\a_{1} \\a_{2} \\\vdots \\a_{d - 1}\end{bmatrix}}\mspace{14mu}{\left( {{mod}\mspace{14mu} q} \right).}}$We denote this matrix-vector product by A=Ta, where T is the d×dtransformation matrix. The inverse DFT is defined as a=T⁻¹A. It turnsout that the inverse of T is obtained by replacing w with w⁻¹ in thematrix, and by placing the multiplicative factor d⁻¹ in front of thematrix. The inverse matrix is given as

$T^{- 1} = {{d^{- 1} \cdot \begin{bmatrix}1 & 1 & 1 & \cdots & 1 \\1 & w^{- 1} & w^{- 2} & \cdots & w^{- {({d - 1})}} \\1 & w^{- 2} & w^{- 4} & \cdots & w^{{- 2}{({d - 1})}} \\\vdots & \vdots & \vdots & \vdots & \vdots \\1 & w^{- {({d - 1})}} & w^{{- 2}{({d - 1})}} & \cdots & w^{{- {({d - 1})}}{({d - 1})}}\end{bmatrix}}\mspace{14mu}{\left( {{mod}\mspace{14mu} q} \right).}}$The matrix-vector product definition of the DFT implies an algorithm tocompute the DFT function, however, it requires d multiplications and d−1additions to compute an entry of the output sequence A. Thus, the totalnumber of multiplications is d², and the total number of additions isd(d−1). This complexity is acceptable for the SME method since we needto use the DFT or IDFT functions only 4 times. If desired, the FastFourier Transform (FFT) algorithm can be used which reduces thecomplexity from O(d²) to O(d log d).

We now derive the transformation and inverse transformation matrices forthe example DFT in Section 4, and perform a DFT computation forillustrating the properties of the arithmetic of the DFT in Z_(q). Inour example, we have q=2²⁰+1, d=8, and w=32. The elements of thetransformation matrix T are of the form w^(i·j) (mod q) for i, j=0, 1, .. . , 7. We compute these values and place it in the transformationmatrix as follows:

$T = {\begin{bmatrix}1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\1 & 32 & 1024 & 32768 & 1048576 & 1048545 & 1047553 & 1015809 \\1 & 1024 & 1048576 & 1047553 & 1 & 1024 & 1048576 & 1047553 \\1 & 32768 & 1047553 & 32 & 1048576 & 1015809 & 1024 & 1048545 \\1 & 1048576 & 1 & 1048576 & 1 & 1048576 & 1 & 1048576 \\1 & 1048545 & 1024 & 1015809 & 1048576 & 32 & 1047553 & 32768 \\1 & 1047553 & 1048576 & 1024 & 1 & 1047553 & 1048576 & 1024 \\1 & 1015809 & 1047553 & 1048545 & 1048576 & 32768 & 1024 & 32\end{bmatrix}.}$The inverse transformation matrix T⁻¹ can be obtained similarly usingthe multiplicative inversesd ⁻¹=8⁻¹(mod 2²⁰+1)=917505.w ⁻¹=32⁻¹(mod 2²⁰+1)=1015809.We obtained it as

$T^{- 1} = {\begin{bmatrix}1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\1 & 1015809 & 1047553 & 1048545 & 1048576 & 32768 & 1024 & 32 \\1 & 1047553 & 1048576 & 1024 & 1 & 1047553 & 1048576 & 1024 \\1 & 1048545 & 1024 & 1015809 & 1048576 & 32 & 1047553 & 32768 \\1 & 1048576 & 1 & 1048576 & 1 & 1048576 & 1 & 1048576 \\1 & 32768 & 1047553 & 32 & 1048576 & 1015809 & 1024 & 1048545 \\1 & 1024 & 1048576 & 1047553 & 1 & 1024 & 1048576 & 1047553 \\1 & 32 & 1024 & 32768 & 1048576 & 1048545 & 1047553 & 1015809\end{bmatrix}.}$We will now perform the DFT computation Θ(t)=DFT[θ(t)] with the inputpolynomialθ(t)=1+8t+5t ³+4t ⁴,as mentioned in Section 4. This polynomial is expressed as a vectorθ=[1,8,0,5,4,0,0,0]and enters the DFT function, producing the output vector Θ. Using thematrix-vector product algorithm, we obtain the vector Θ as

${\begin{bmatrix}1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\1 & 32 & 1024 & 32768 & 1048576 & 1048545 & 1047553 & 1015809 \\1 & 1024 & 1048576 & 1047553 & 1 & 1024 & 1048576 & 1047553 \\1 & 32768 & 1047553 & 32 & 1048576 & 1015809 & 1024 & 1048545 \\1 & 1048576 & 1 & 1048576 & 1 & 1048576 & 1 & 1048576 \\1 & 1048545 & 1024 & 1015809 & 1048576 & 32 & 1047553 & 32768 \\1 & 1047553 & 1048576 & 1024 & 1 & 1047553 & 1048576 & 1024 \\1 & 1015809 & 1047553 & 1048545 & 1048576 & 32768 & 1024 & 32\end{bmatrix}\left\lbrack \begin{matrix}1 \\8 \\0 \\5 \\4 \\0 \\0 \\0\end{matrix} \right\rbrack} = {\left\lbrack \begin{matrix}18 \\164093 \\3077 \\262301 \\1048569 \\884478 \\1045510 \\786270\end{matrix} \right\rbrack.}$Recall that all multiplications and additions are performed modulo qwhere q=2²⁰+1. We express Θ as a polynomialΘ(t)=18+164093t+3077t ²+262301t ³+1048569t ⁴+884478t ⁵+1045510t⁶+786270t ⁷which was the result obtained in Step 4 of ‘Preprocessing with n’ inSection 4.2.6 Parameter SelectionIn this section, we describe the methodology for selecting theparameters for the DFT and SME functions in order to apply the method inpublic-key cryptography. We will tabulate some example parameters formodular exponentiations using the SME method, starting from 530 bits upto 4,110 bits. We also tabulate typical rings and their DFT parametersfor use in the SME function.

A ring for which q is of the form 2^(v)±1 is the most suitable for theSME computation since the modular arithmetic operations for such q aresimplified. The rings of the form 2^(v)−1 are called the Mersenne rings,while the rings of the form 2^(v)+1 are called the Fermat rings. InTable 16 and Table 17 in the Appendix Section of this document, wetabulate the Fermat and Mersenne rings suitable for the SME function.Furthermore, we also tabulate a root of unity and the DFT length foreach ring. Whenever possible, we select the root of unity as w=2 or w=−2since multiplication with such numbers are accomplished by shifting.

The Mersenne and Fermat rings are not the only suitable rings for theSME method. Let q″ (not necessarily a prime) be a small divisor of q′.The rings of the form Z_(q′/q″) are also quite useful. Since q″ dividesq′, the arithmetic modulo (q′/q″) can be carried in the ring Z_(q′). Byselecting Z_(q′) as a Mersenne or Fermat ring, we also simplify thearithmetic. Such rings are called pseudo Mersenne or pseudo Fermatrings. We also propose the use of such rings in the SME method. In Table18A, 18B, 18C and Tables 19A, 19B, we tabulate the pseudo Mersenne andFermat rings together with their root of unity and the length of the DFTvalues.

Once the underlying ring and the DFT length and the root of unity areselected, the maximum modulus size in the SME method can be computed byfinding the radix b. The relation between these parameters is given bythe following inequality

$\begin{matrix}{b^{4} < {\frac{3\; q}{s^{3}}.}} & (1)\end{matrix}$In Table 4, we give some example rings and their parameters. Toillustrate the methodology, we select a ring from this table, e.g.,q=q′/q″=(2⁵⁷−1)/7. This ring comes with the root of unity w=−2 and thelength s=57. Using the q and s values and the above inequality, wecompute

$b < \sqrt[4]{\frac{3 \cdot \left( {2^{57} - 1} \right)}{7 \cdot (57)^{3}}} \approx {759.93.}$Therefore, we find u=└ log₂(b)┘=9. Since s=57, we find the maximum bitlength of the exponentiation as k=s·u=57·9=513 as given in Table 4.

TABLE 4 Parameter selection. Bits Ring DFT Root Wordsize Words k q d w us 513  (2⁵⁷ − 1)/7 114 −2 10 53 518  2⁷³ − 1 73 2 14 37 518  (2⁷³ + 1)/373 4 14 37 768  2⁶⁴ + 1 128 2 12 64 1,185  2⁷⁹ − 1 158 −2 15 79 1,7922¹²⁸ + 1 128 2 28 64 2,060 (2¹⁹³ + 1)/3 206 2 20 103 2,163 2¹⁰³ − 1 206−2 21 103 3,456 (2¹²⁸ + 1) 256 −2 27 128 4,170 2¹³⁹ − 1 274 −2 30 1392.7 Improved Parameter SelectionThe SMM method can be improved with a simple arrangement. It is possibleto replace the multiplication β·Θ(t) in Step 7 of the SMM method withadditions at a cost of precomputations and storage space. This approachincreases the wordsize and allows the selection of a smaller ring for asimilar modulus size.

Let θ^(i)(t) be the polynomial representation of an integer multiple ofn such that θ₀ ^(i)=2^(i-1) for i=1, 2, . . . , u. We can now writeβ·Θ(t) as

$\begin{matrix}{{{\beta \cdot {\Theta(t)}} = {\sum\limits_{i = 1}^{u}\;{\beta_{i} \cdot {\Theta^{i}(t)}}}},} & (2)\end{matrix}$where β_(i) is a binary digit of β and Θ^(i)(t)=DFT[θ^(i)(t)] for i=1,2, . . . , u. Note that β<2^(u) and β_(i)=0 for i≧u. The polynomial set{Θ¹(t), Θ²(t), . . . , Θ^(u)(t)} needs to be precomputed and stored. Theprecomputation can be done in the Preprocessing with n phase: Startingwith multiplying θ¹(t) (which is θ(t) of the SMM) by powers of 2 afterfinding v₀. We then apply the DFT function to these polynomials in orderto get Θ^(i)(t)=DFT[Θ^(i)(t)] for i=1, 2, . . . , u. With thisadjustment a better bound for b could be given as

$\begin{matrix}{{b^{2}{\log_{2}(b)}} < {\frac{3\; q}{s^{3}}.}} & (3)\end{matrix}$This bound gives us the improved parameters which are tabulated in Table5 below.

TABLE 5 Improved parameter selection. Bits Ring DFT Root Wordsize Wordsk q d w u s 540 (2⁵⁹ + 1)/3 59 2 19 30 564  2⁴⁷ − 1 94 −2 12 47 570  2⁵⁹− 1 59 2 19 30 620  2⁶¹ − 1 61 2 20 31 672  2⁶⁴ + 1 64 4 21 32 1,098 2⁶¹ − 1 122 2 18 61 1,120  2⁷⁹ − 1 79 2 28 40 1,216  2⁶⁴ + 1 128 2 1964 2,054  2⁷⁹ − 1 158 2 26 79 2,160 2¹⁰⁷ − 1 107 2 40 54 3,200 2¹²⁸ + 1128 4 50 64 4,173 2¹⁰⁷ − 1 214 −2 39 107 6,272 2¹²⁸ + 1 256 2 49 128

Furthermore, in Tables 4 & 5, there are two important issues that shouldbe considered for an efficient design. First issue is the number ofwords (s) or the DFT length (d) which are related to one another bys=┌d/2┐. Observe that the maximum modulus size is given as k=su.Moreover, the loop in the SMM algorithm runs d times. Therefore, adecrease in d is desirable for some designs even it is at a cost ofusing larger rings (q) and requiring some more storage space. In Table6, we demonstrate parameters of this nature.

TABLE 6 Decreasing d. Bits Ring DFT Root Wordsize Words k q d w u s 540(2¹¹⁵ − 1)/31 23 32 45 12 560  (2⁹³ + 1)/9  31 64 35 16 608 2⁹⁶ + 1 3264 38 16 1,024 (2¹⁵⁵ + 1)/33 31 1024 64 16 1,122 (2¹²⁹ + 1)/9  43 64 5122 2,150 (2¹²⁹ + 1)/9  86 64 50 43

The second issue is the wordsize u which determines the maximum modulussize k as k=su and the number of elements in the set {Θ¹(t), Θ²(t), . .. , Θ^(u)(t)} which needs to be stored. For instance, in the ringq=2⁴⁷−1, the value of u=12 implies that {Θ¹(t), Θ²(t), . . . , Θ¹²(t)}need to be stored. Here Θ^(i)(t) is a sequence of length 86 whoseelements are from the ring q=2⁴⁷−1 for all i=1, 2, . . . , 12. Thisstorage requirements of the above strategy can be excessive, however, ahybrid strategy is also possible. This can be summarized with thefollowing equation

${{{\beta \cdot \Theta}(t)} = {{\beta^{\prime} \cdot {\Theta^{1}(t)}} + {\sum\limits_{i = u^{\prime}}^{u}{\beta_{i} \cdot {\Theta^{i}(t)}}}}},$where β′=β mod 2^(u′) and β_(i) stands for binary digits of β÷2^(u′) forsome 0≦u′≦u.2.8 The Use of Chinese Remainder Theorem (CRT)In many situations it is desirable to break a congruence mod n into asystem of small congruences mod factors of n. Once computations areperformed in the small factor rings, by using CRT, the resultant systemof congruences is replaced by a single congruence under certainconditions.

Chinese Remainder Theorem. For i≧2, let p₁, p₂, . . . , p_(l) benon-zero integers which are pairwise relatively prime:gcd(p_(i),p_(i))=1 for i≠j. Then, for any integers a₁, a₂, . . . ,a_(l), the system of congruencesx≡a ₁ mod p ₁ , x≡a ₂ mod p ₂ , . . . , x≡a _(l) mod p _(l),has a solution, and this solution is uniquely determined modulo p₁ p₂ .. . p_(l). We rather interested in computation (lift) of finalcongruence instead of proving CRT. Although there are more efficientways of lifting, we consider the single-radix conversion (SRC) method.Going back to the general system, SRC algorithm computes x using thefollowing summation for given a₁, a₂, . . . , a_(l) and p₁, p₂, . . . ,p_(l).

$\begin{matrix}{{x = {\sum\limits_{i = 1}^{l}\;{a_{i}c_{i}{n_{i}^{\prime}\left( {{mod}\; n} \right)}}}},{{{where}\mspace{14mu} n_{i}^{\prime}} = {{p_{1}p_{2}\mspace{14mu}\ldots\mspace{14mu} p_{i - 1}p_{i + 1}\mspace{14mu}\ldots\mspace{14mu} p_{l}} = \frac{n}{p_{i}}}}} & (4)\end{matrix}$and c_(i) is the multiplicative inverse of n′_(i) mod p_(i).

When the SME is considered CRT can be used in two different ways. Thefirst one is for degree where the second one is for radius. We startwith the first one.

2.8.1 CRT for Degree

When n is composite number such as the RSA modulus n, the CRT is verybeneficial if the prime factorization of n is known (e.g. RSAdecryption). This method is easily adopted to SME.

Suppose that n=p₁p₂ is a typically RSA modulus with two large distinctprime factors p₁ and p₂. The computation m^(e) mod n can be put into thesystem of two small congruences as follows:m ₁ :=c ^(e) mod p ₁m ₂ :=c ^(e) mod p ₂However, applying Fermat's theorem to the exponents, we only need tocomputem ₁ :=c ^(e) ¹ mod p ₁m ₂ :=c ^(e) ² mod p ₂where e₁:=e mod(p₁−1) and e₂:=e mod(p ₂−1).

Observe that the computation of m₁ and m₂ is performed by using the twoseparate SME with inputs m₁, e₁ and m₂, e₂ respectively and when theyare computed, proceeding with the SRC algorithm, we achieve m by usingthe Sum (4)m=m ₁ c ₁ p ₂ +m ₂ c ₂ p ₁ mod n  (5)where c₁=p₂ ⁻¹ mod p₁ and c₂=p₁ ⁻¹ mod p₂.2.8.2 CRT for RadiusCRT is proposed to use for integer multiplication by J. M. Pollard [6]and independently by A. Schönhage and V. Strassen [7] who furtherrecommend to use FFT over Fermat rings

_(q) with q=2² ^(r) +1 and d=2^(r) for some r>0, they proved the famoustime bound O(n log n log log n) for integer multiplication, with somecare their ideas can be applied to spectral modular algorithms.

Let us give an example; suppose that x=10+5t+t² and y=2+3t+3t²,convolving x and y givesz=x⊙y=20+9t+20t ²+24t ³+9t ⁴ mod 31If the same computation is done modulo 2z=x⊙y=0+t+0·t ²+0·t ³ +t ⁴ mod 2is computed and this suffices to recover the real coefficients z=(20,40, 51, 24, 9) from the congruences z=(20, 9, 20, 24, 9)mod 31 and z=(0,0, 1, 0, 1)mod 2 by using CRT on the coefficients. For instance, thesecond term can be found by solving the congruencesz ₂≡9 mod 31, and z ₂≡0 mod 2 which gives z ₂≡40 mod 62by using the Equation (5).

When it comes to the spectral modular operations the methodology changesslightly. One has to recover the actual least significant time digit atevery step of the reduction process.

Let p₁, p₂, . . . , p_(l) be positive, pairwise relatively prime numberssuch that p₁ p₂ . . . p_(l)>q. Moreover, assume that for all j=1, 2, . .. , l there exists a length d_(j) DFT over

_(p) _(j) such that d_(j)≧d. Let for j=1, 2, . . . , lx _(j)(t)=x _(j0) +x _(j1) t+ . . . +x _(j(d) _(j) ₋₁₎ t ^(d) ^(j) ⁻¹≡x(t)mod p _(j),y _(j)(t)=yy _(j0) +j _(j1) t+ . . . +y _(j(d) _(j) ₋₁₎ t ^(d) ^(j) ⁻¹≡y(t)mod p _(j),n _(j)(t)=n _(j0) +n _(j1) t+ . . . +n _(j(d) _(j) ₋₁₎ t ^(d) ^(j) ⁻¹≡n(t)mod p _(j)be the transform pairs of X_(j)(t), Y_(j)(t), and N_(j)(t) respectively.If X(t)=(X₁(t), X₂(t), . . . , X_(l-1)(t)) and Y(t)=(Y₁(t), Y₂(t), . . ., Y_(l-1)(t)) stand for vectors of spectral polynomials, the followingprocedure computes Z(t)=(t), (Z₁(t), Z₂(t), . . . , Z_(l-1)(t)).

-   -   1: Z(t):=X(t)⊙Y(t)    -   2: α:=(0, 0, . . . , 0), vector zero of dimension l    -   3: for i=0 to d−1    -   4: for j=1 to l    -   5: z_(j0):=d_(j) ⁻¹·(Z_(j0)+Z_(j1)+ . . . +Z_(jd))mod p_(j)    -   6: end for    -   7: z₀:=Σ_(j=1) ^(l)z_(j0)c_(j)p′_(j)(mod q)    -   8: for j=1 to l    -   9: β_(j):=(−(z₀+α_(j))mod b)mod p_(j)    -   10: α_(j):=(z_(0j)α_(j)+β_(j))/b mod p_(j)    -   11: Z_(j)(t):=Z_(j)(t)+β_(j)·N_(j)(t)mod p_(j)    -   12: Z_(j)(t):=Z_(j)(t)−(z_(0j)+β_(j))(t)mod p_(j)    -   13: Z_(j)(t):=Z_(j)(t)⊙Γ_(j)(t)mod p_(j)    -   14: end for    -   15: end for    -   16: Z(t):=Z(t)+α(t)    -   17: return Z(t)

Once Z(t)=(Z₁(t), Z₂(t), . . . , Z_(l-1)(t)) is computed, after applyingthe inverse DFT, one recovers z(t)=x(t)⊙y(t) from small congruences orin case of an exponentiation keep using the above core consecutively aswe described in Section 2.2.

In Tables 20, 21A and 21B in the Appendix Section of this document, somenice rings that CRT can be applied. As before most popular RSA sizes aretargeted.

3 SPECTRAL ARITHMETIC FOR EXTENSION FIELDS

3.1 Binary Extension Fields

In this section we describe methods of carrying arithmetic for binaryextension fields by using spectral techniques. DFT and convolution isused to compute the multiplications in the binary extension fieldGF(2^(k)) for some k>0.

One way of representing the elements of GF(2^(k)) is by binarypolynomials of degree k−1 or less. If k is a composite number, i.e. k=sufor some integers s and u, the field GF(2^(k)) is an extension field forGF(2^(u)). Since GF(2^(k)) is isomorphic to GF((2^(u))^(s)), theelements of GF(2^(k))≈GF((2^(u))^(s)) can also be represented bypolynomials over GF(2^(u)) of degree s−1 or less. That is equivalent tosay that GF((2^(u))^(s)) is isomorphic to GF(2^(k))/(n(t)) for someirreducible polynomial n(t) of degree s−1 over GF(2^(u)).

It is possible to embed the extension fields into some ring structurethat admits a DFT transform. Let

${{a(t)} = {{\sum\limits_{i = 0}^{s - 1}\;{a_{i}t^{i}}} = {a_{0} + {a_{1}t} + {a_{2}t^{2}} + \ldots + {a_{s - 1}t^{s - 1}}}}},{a_{i} \in {G\;{F\left( 2^{u} \right)}}},$R:=Z₂[γ]/(f(γ)) be a polynomial ring for some binary polynomial f(γ) ofdegree v (not necessarily irreducible) and w be a principal s root ofunity in R. Then one can define the transform of the sequence (a_(i))(0≦i≦s−1) of members of R to be the sequence (A_(i)) where

$\begin{matrix}{A_{i} = {\sum\limits_{j = 0}^{s - 1}\;{a_{i} \cdot w^{i\; j}}}} & (6)\end{matrix}$This is analogous to DFT over finite integer rings but here theoperations are carried over polynomial ring R instead of in the integerring Z_(q).

The inverse transform to (6) is given by

$a_{i} = {d^{- 1} \cdot {\sum\limits_{j = 0}^{s - 1}\;{A_{i} \cdot w^{{- i}\; j}}}}$The inverse DFT map is well defined if

-   -   The multiplicative inverse of d exists in R, which implies gcd        (d, 2)=1.    -   A principal dth root of unity exists in R, which requires that d        divides |R|−1 where |R| denotes the order of the ring R (i.e.        the number of elements of R) for every prime p divisor of |R|−1.        We denote the principal dth root of unity with w.

As before, we will use a polynomial notation;A(t)=DFT _(d) ^(w) [a(t)],a _(i) εGF(2^(u)) and A _(i) εRdenotes the d-point DFT using the dth root of w with inverse transformdenoted by a(t)=IDFT_(d) ^(w)[A(t)] (or simply as a(t)=IDFT[A(t)]).3.2 Spectral Modular Multiplication for Binary Extension Fields(SMM_(—)2^(k))We describe the SMM_(—)2^(k) method as an adaptation of the SMM tobinary extension fields. We try to use the same notation for simplicitybut be aware that since the underlying structure is a polynomial rings,most of the time a polynomial arithmetic is used for computations. Inour presentation we make use of the following polynomial in ouroperationsΓ(t)=1+w ⁻¹ t+w ⁻² t ² + . . . +w ^(−(d-1)) t ^(d-1).The defining polynomial n(t) might be a special irreducible polynomialin order to have some simplified reductions. Our assumption is that n(t)is either a trinomial or a pentanomial. Let n₀ denote the constant termof n(t), Since gcd(n₀,γ^(u))=1, let v₀ be the multiplicative inverse ofn₀ modulo γ^(u), i.e.,v ₀ =n ₀ ⁻¹(mod γ^(u)).Whenever n(t) is chosen as a trinomial or a pentanomial having n₀≠1. Weneed to use some multiples of n(t) in our method, which we denote by θ,and is derived from n(t) such thatθ=v ₀ n(t).Note that since v₀n₀=1 (mod γ^(u)), the constant term of θ becomes 1,i.e., θ₀=1. The inverse v₀=n₀ ⁻¹ (mod γ^(u)) can be computed using theextended polynomial Euclidean algorithm.

TABLE 7 The symbols used in the SMM_2^(k) method. Symbol MeaningRelationship A(t) DFT of an input polynomial a(t) DFT_(d) ^(w)[a(t)] ksize of the original field GF(2^(k)) s Number of coefficients in a(t) k= su, i.e. GF(2^(k)) ≈ GF((2^(u))^(s)) u size of a single coefficient k= su, i.e. GF(2^(k)) ≈ GF((2^(u))^(s)) v bit size of ring R 2^(v−1) ≦|R| ≦ 2^(v), n(t) defining polynomial of GF((2^(u))^(s)) GF((2^(u))^(s))≈ GF(2^(k))/(n(t)) f(γ) defining polynomial of R R ≈ Z₂/(f(γ)) d Lengthof the DFT s = ┌d/2┐ R Polynomial ring modulo f(γ) R ≈ Z₂/(f(γ)) wPrincipal dth root of unity in R w^(d) = 1 (mod f(γ)) n₀ constant termof n(t) n₀ = n(0) v₀ Inverse of no modulo γ^(u) v₀ = n₀ ⁻¹ (mod γ^(u)) θv₀ multiple of n(t) θ = v₀n(t) and θ₀ = 1 ⊙ Component-wise c(t) = a(t) ⊙b(t) multiplication in R^(d) Γ(t) 1 + w⁻¹t + . . . + w^(−(d−1))t^(d−1) wis dth root of unity DFT_(d) ^(w)[a(t)] d-point DFT of a(t) in R A(t) =DFT[a(t)] IDFT_(d) ^(w)[A(t)] Inverse DFT function a(t) = IDFT[A(t)]

In addition to the usual scalar multiplication, we will also utilize thecomponent-wise multiplication of elements of R^(d), and denote thisoperation with the symbol ⊙ asc(t)=a(t)⊙b(t).Given the vectors a(t)=a₀+a₁t+ . . . +a_(d-1)t^(d-1) and b(t)=b₀+b₁t+ .. . +b_(d-1)t^(d-1), the resulting vector after the ⊙ operation will bec(t)=c₀+c₁t+ . . . +c_(d-1)t^(d-1) such that c_(i)(γ)=a_(i)(γ)b_(i)(γ)(mod f(γ)) for i=0, 1, . . . , d−1.

Note that here we use two indeterminate t and γ for representations; γis the indeterminate that is used to represent the elements of thepolynomial ring R where t is used for polynomial representation ofsequences.

The SMM_(—)2^(k) method takes two arguments, such as A(t) and B(t), andcomputes R(t) as the output. We will denote the operation usingR(t)=SMM_(—)2^(k) [A(t),B(t)].The steps of the SMM_(—)2^(k) method are given below. Recall that thesteps of the algorithm are similar to SMM for integer rings, the maindifference is the arithmetic. Since the underlying structure is apolynomial ring we use polynomial arithmetic for computations.

-   -   1: R(t)=A(t)⊙B(t)(mod f(γ))    -   2: α=0    -   3: for i=0 to d−1    -   4: r₀(t)=d⁻¹(R₀+R₁+ . . . +R_(d-1))(mod f(γ))    -   5: β(γ)=−(r₀(γ)+α(γ))(mod γ^(u))    -   6: α(γ)=(r₀(γ)+α(γ)+β(γ))/γ^(u)    -   7: R(t)=R(t)+β·Θ(t)(mod f(γ))    -   8: R(t)=R(t)−(r₀(γ)+β(γ))(t)(mod f(γ))    -   9: R(t)=R(t)⊙Γ(t)(mod f(γ))    -   10: end for    -   11: return R(t)

We remark that in Step 8 (r₀(γ)+β(γ))(t) is a notation for a polynomialwith same coefficients of degree d−1. That is(r ₀(γ)+β(γ))(t)=(r ₀(γ)+β(γ))+(r ₀(γ)+β(γ))t+ . . . +(r ₀(γ)+β(γ))t^(d-1)

Observe that the multiplication of Step 7 can be replaced by

$\begin{matrix}{{{\beta \cdot \Theta}(t)} = {\sum\limits_{i = 1}^{u}\;{\beta_{i} \cdot {{\Theta^{i}(t)}.}}}} & (7)\end{matrix}$Here the set {Θ¹(t), Θ²(t), . . . , Θ^(u)(t)} is the transform of theset {θ¹(t), θ²(t), . . . , θ^(u)(t)} where θ^(i)(t) is the polynomialsuch that θ₀ ^(i)(γ)=γ^(i-1)εR for i=1, 2, . . . , u. This modificationis similar to the one described in Section 2.7 and improves theparameter selection. Since we work over the field Z₂, the arithmetic iscarry free. In fact this makes the relation between the parameters s, uand v simpler (recall that GF(2^(k))≈GF((2^(u))^(s))).

The relations for SMM_(—)2^(k) and improved SMM_(—)2^(k) can be given as4us≦v and 2us≦v respectively.

Remark that the choice of f(γ) is very important for realizations. Whileworking with integers for DFT function the underlying rings are chosenas Fermat or Mersenne rings, moreover the principal root of unity ispicked as a power of two in order to make the computations as simple aspossible. Similar simple constructions are possible while working withDFT over polynomial ring R. For instance, if f(γ) is irreducible Rbecomes a field and many methods for efficient arithmetic can be emergedin this situation. We start with the simplest selections of thepolynomial f(γ).

3.2.1 Spectral Arithmetic Over the Polynomial Rings

The most convenient choice of f(γ) would be the polynomials of typeγ^(n)−1. Note that γ^(n)−1 is not irreducible over Z₂ soR:=Z₂[γ]/(γ^(n)−1) is a ring with multiplicative set order 2^(n)−1. Inthese rings the arithmetic corresponds to ones complement polynomialarithmetic.

As before, our primary objective is to turn the multiplications withroot of unity to circular shifts so we pick w=γ. The suitable rings thatadmit the DFT structure with these choices are tabulated in Tables 8 and9. Observe that k targets key sizes of popular cryptosystems.

TABLE 8 Standard Parameter Selection for SMM_2^(k). Bits Ring Ring DFTRoot Wordsize Words k f(γ) q d w u s 105 γ²⁹ − 1 2²⁹ − 1 29 γ 7 15 171γ³⁷ − 1 2³⁷ − 1 37 γ 9 19 210 γ⁴¹ − 1 2⁴¹ − 1 41 γ 10 21 242 γ⁴³ − 1 2⁴³− 1 43 γ 11 22 288 γ⁴⁷ − 1 2⁴⁷ − 1 47 γ 12 24 351 γ⁵³ − 1 2⁵³ − 1 53 γ13 27 450 γ⁵⁹ − 1 2⁵⁹ − 1 59 γ 15 30 578 γ⁶⁷ − 1 2⁶⁷ − 1 67 γ 17 34

TABLE 9 Improved Parameter Selection for SMM_2^(k). Bits Ring Ring DFTRoot Wordsize Words k f(γ) q d w u s 210 γ²⁹ − 1 2²⁹ − 1 29 γ 14 15 240γ³¹ − 1 2³¹ − 1 31 γ 15 16 342 γ³⁷ − 1 2³⁷ − 1 37 γ 18 19 420 γ⁴¹ − 12⁴¹ − 1 41 γ 20 21 702 γ⁵³ − 1 2⁵³ − 1 53 γ 26 273.2.2 Spectral Arithmetic Over the Finite Fields

In general the selection of f(γ)=γ^(n)−1 and w=γ gives the simplestarithmetic but in some cases different f(γ) may be extremely useful. Forinstance, let w=γ and f(γ)=1+γ+γ²+ . . . +γ^(u) (i.e. All-One-Polynomial(AOP)) be an irreducible polynomial. In [2] it is shown that thearithmetic in R with this selection emerges an AOP reduction which is assimple as a trinomial reduction. For w=γ the roots of unity set is givenas follows{γ, γ², . . . , γ^(u-1), 1+γ+γ²+ . . . +γ^(u-1)}  (8)

Note that f(γ) is chosen to be an irreducible polynomial, thus R isisomorphic to the finite field GF(2^(u)). one can realize the ring R asa vector space over the field GF(2) and consider a suitable basis forthe computations. A basis of the formN={α, α ^(q), . . . , α^(q) ^(n-1) }  (9)is called a normal basis for GF(2^(u)) over GF(2). When normal basisrepresentation is used squaring operation corresponds to a cyclic shift.This is very important for realizations of exponentiation using therepeated square and multiply method.

For every finite field there exists a normal basis, in fact these basisare not unique. One way of stating the difference between these basis isthat they may have different complexities for the multiplicationoperation. Those ones with the minimal complexity are the most importantones for computations (also called optimal normal basis). For ourpurposes a specific class of optimal normal basis are very special,these basis are called type I optimal basis. In construction of type Ioptimal basis α=γ is chosen to be the principal root of unity thereforethe set (9) and root of unity set (8) are equal with a different order.By such a selection of basis, apart from easy squaring, multiplicationswith roots of unity corresponds to shifts without any reduction.Unfortunately not every field has an optimal normal basis since α=γ isnot a principal root of unity for every choice of v for GF(2^(v)). InTable 10 we tabulate the finite fields having type I optimal basis.

TABLE 10 Values at which Type I optimal normal basis exist. n 4 10 12 1828 36 52 58 60 100 106 130

If the parameter selection is reconsidered one can have Tables 11 and 12for SMM_(—)2^(k) algorithm. Note that AOP(m) representsall-one-polynomial having degree m.

TABLE 11 Parameter Selection for SMM_2^(k) over a Finite Field with ONB.Bits Ring Ring DFT Root Wordsize Words k f(γ) q d w u s 98 AOP(28)GF(2²⁸) 28 γ 7 14 162 AOP(36) GF(2³⁶) 36 γ 9 18 364 AOP(52) GF(2⁵²) 52 γ13 26 406 AOP(58) GF(2⁵⁸) 58 γ 14 29 450 AOP(60) GF(2⁶⁰) 60 γ 15 30

TABLE 12 Improved Parameter Selection for SMM_2^(k) over a FiniteFieldwith ONB. Bits Ring Ring DFT Root Wordsize Words k f(γ) q d w u s196 AOP(28) GF(2²⁸) 28 γ 14 14 324 AOP(36) GF(2³⁶) 36 γ 18 18 676AOP(52) GF(2⁵²) 52 γ 26 26 841 AOP(58) GF(2⁵⁸) 58 γ 29 29 900 AOP(60)GF(2⁶⁰) 60 γ 30 303.3 Spectral Modular Multiplication for Extension Fields with SmallCharacteristics (SMM_q^(k))In the previous section we presented spectral algorithms for binaryextension fields. Spectral multiplication can be extremely efficient forextension fields having small characteristics. By “small” we mean thetypical wordsize of todays architectures. For instance if the fieldGF(q^(k)) is considered we assume 2⁷<q<2⁶⁴.

Leta(t)=a ₀ +a ₁ t+a ₂ t ² + . . . +a _(k-1) t ^(k-1) , a _(i) εGF(q)be a polynomial that represents an element of GF(q^(k)) and wεGF(q) be adth primitive root of unity for d>2k . We define the DFT of thepolynomial a(t) with the polynomial A(t) such that

$\begin{matrix}{A_{i} = {\sum\limits_{j = 0}^{d - 1}\;{a_{i} \cdot w^{i\; j}}}} & (10)\end{matrix}$

As before the conditions for the existence of the DFT map is guaranteedby the existence of d⁻¹ (i.e. gcd(d,q)=1 and this is given since q is aprime). Since the arithmetic is modulo q there exist no carries andoverflows hence the Spectral Modular Multiplication for Extension fields(SMM_q^(k)) having small characteristic is simpler.

Let a(t)=a₀+a₁t+a₂t²+ . . . +a_(k-1)t^(k-1), and b(t)=b₀+b₁t+b₂t²+ . . .+b_(k-1)t^(k-1) be in GF(q^(k)) and A(t) and B(t) be their transformpairs respectively. We present the SMM_q^(k) methodR(t)=SMM_(—) q ^(k) [A(t),B(t)]as follows:

-   -   1: R(t)=A(t)⊙B(t)(mod q)    -   2: for i=0 to s−1 (where s=┌d/2┐)    -   3: r₀=d⁻¹(R₀+R₁+ . . . +R_(d-1))(mod q)    -   4: β=−r₀ (mod q)    -   5: R(t)=R(t)+β·Θ(t)(mod q)    -   6: R(t)=R(t)⊙Γ(t)(mod q)    -   7: end for    -   8: return R(t)

Remark. We need to say that like any spectral core described in thisdocument the above algorithm describes a Montgomery type reduction whichneeds inputs in Montgomery form, i.e. R(t) is the transform pair ofr(t)=a(t)·b(t)·t^(−s.)

Since the elements of GF(q^(k)) naturally mapped into the Fourier ringZ_(q) ^(d) and no overflows occurs, a standard reduction is equallyapplicable while using spectral methods for GF(q^(k)). This could savesignificant resources by simply not using the Montgomery representationconversions.

Therefore, if mirror image of a(t) and b(t) are used as inputs toSMM_q^(k), in other words, if a(t)=a₀t^(k1)±a₁t^(k-2)+ . . .+a_(k-2)t+a_(k-1), and b(t)=b₀t^(k-1)+b₁t^(k-2)+ . . .+b_(k-2)t+b_(k-1), the output of SMM_q^(k) algorithm, R(t), simply givesthe transform pair of r(t)=a(t)·b(t). The reason why we use the mirrorimage is because we reduce starting from the most significantcoefficients unlike to Montgomery approach. A last remark is θ(t) shouldbe chosen accordingly when this approach is employed.

When it comes to the relations between the parameters s, u and v onceagain everything is simplified by the absence of the overflows. Animproved method as described with Equation (7) does not serve any boundadvantage over regular SMM_q^(k). For both algorithm we chooseu=log₂q=v. In Table 13 we tabulate some nice rings that target somepopular key sizes k with SMM_q^(k).

TABLE 13 Parameter Selection for SMM_q^(k). Bits Ring DFT Root WordsizeWords s · u q d w u s 153 2¹⁷ − 1 17 2 17  9 169 2¹³ − 1 26 −2 13 13 1902¹⁹ − 1 19 2 19 10 256 2¹⁶ + 1 32 2 16 16 289 2¹⁷ − 1 34 −2 17 17 3612¹⁹ − 1 38 −2 19 19 496 2³¹ − 1 31 2 31 16 961 2³¹ − 1 62 −2 31 31

Sometimes choosing the prime q as a divisor of a number of the form2^(v)±1 can be useful for finding more suitable structures to performSMM_q^(k) method. In general, arithmetic in Z_(q) is difficult; however,since q is a factor of 2^(v)±1, the arithmetic modulo q can be carriedin the ring Z₂ _(v) _(±1). The actual result can be found with a finalmodulo q reduction. Such an approach simplifies the overall computation.In Table 14 we tabulated some suitable rings having such a property.

TABLE 14 Parameter Selection for some Pseudo Mersenne and Fermat Rings.Bits Ring DFT Root Wordsize Words s · u q d w u s 150 (2²⁰ + 1)/17  20 415 10 170 (2¹⁹ + 1)/3  19 4 17 10 204 (2²³ − 1)/47  23 2 17 12 252(2²³ + 1)/3  23 4 21 12 255 (2¹⁷ + 1)/3  34 −2 15 17 300 (2²⁰ + 1)/17 40 −2 15 20 322 (2²⁸ + 1)/17  28 4 23 14 323 (2¹⁹ + 1)/3  38 −2 17 19352 (2³² + 1)/641 32 4 22 16 391 (2²³ − 1)/47  46 −2 17 23 464 (2³¹ +1)/3  31 4 29 16 483 (2²³ + 1)/3  46 2 21 23 644 (2²⁸ + 1)/17  56 2 1423 704 (2³² + 1)/641 64 2 22 32 899 (2³¹ + 1)/3  62 2 29 31

4 SPECTRAL POINT MULTIPLICATION FOR ELLIPTIC CURVE CRYPTOGRAPHY

In this section we demonstrate how SMM method can be used for performingelliptic curve point multiplication operation. Although our presentationreveals the curves over the prime fields, with minor changes thetechnique is completely applicable for curves over GF(q^(k)) includingthe binary extensions. First we briefly describe elliptic curves overfinite fields.4.1 Elliptic Curves over Finite FieldsAn elliptic curve E over a field GF(q^(k)), (for a prime q>3) isdetermined by parameters a, bεGF(q^(k)) which satisfy 4a³+27b²≠0. Thecurve consists of the set of solutions or points

=(x,y) for x,yεGF(q^(k)) to the equationy ² ≡x ³ +ax+b  (11)together with an extra point

called the point at infinity. The set of points on E forms a group underthe following addition rule: Let (x₁,y₁)εE(GF(q^(k))) and(x₂,y₂)εE(GF(q^(k))) be two points such that x₁≠x₂. Then, we have(x₁,y₁)+(x₂,y₂)=(x₃,y₃), whereλ=(y ₂ −y ₁)(x ₂ −x ₁)⁻¹x ₃=λ² −x ₁ −x ₂,y ₃=λ(x ₁ −x ₃)−y ₁.Observe that all computations are performed within the finite fieldGF(q^(k)). Therefore, the efficient finite field arithmetic is utmostinterest for ECC. If the base field structure is chosen as a binaryfield extension the curve equation slightly changes because thecharacteristic q=2. The curve E with parameters b≠0, aεGF(2^(k))consists of the set of pairs in GF(2^(k)) satisfying the equationy ² +xy≡x ³ +ax ² +btogether with an extra point

. Once again, the set of points on E forms a group. The addition rulefollows; let (x₁,y₁)εE(GF(2^(k))) and (x₂,y₂)εE(GF(2^(k))) be two pointssuch that x₁≠x₂. Then, we have (x₁,y₁)+(x₂,y₂)=(x₃,y₃), where

$\left. {{\lambda = \frac{y_{2} + y_{1}}{x_{2} + x_{1}}},{x_{3} = {\lambda^{2} + \lambda + x_{1} + x_{2} + a}},{y_{3} = {{\lambda\left( {x_{1} + x_{3}} \right)} + x_{3} + y_{1}}}} \right).$The above calculation of the pair (x₃,y₃) is different from the caseq>3. In fact, this is the main computational difference in practice.Therefore, a point multiplication method for GF(2^(k)) can be achievedby simply reflecting the above point addition formula.

The security provided by ECC is guaranteed by the difficulty of thediscrete logarithm problem in the elliptic curve group. The discretelogarithm problem is the problem of finding the least positive number,k, which satisfies the equation

1 = ⅇ × 0 = 0 + 0 + … + 0 ︸ e ⁢ ⁢ times ,where

₀ and

₁ are points on the elliptic curve. Naturally, the basic computation(called point multiplication) in ECC is finding the eth (additive) powerof an element

₀ in the group. This involves additions, multiplications, and inversionsof field elements in which are in the coordinates of the points. Thatis, it relies completely upon calculations in the underlying field,GF(q^(k)).

Observe that for k=1, GF(q¹) is a prime field and in Sections 2, wepresent our spectral algorithms as techniques for performing efficientmodular exponentiation. We claim that the spectral methods can beemployed for elliptic point multiplication likewise described for themodular exponentiation. Fortunately, point multiplication favors theefficiency since it is another operation which needs consecutiveemployment of multiplication.

As we mention before we describe the spectral point multiplication overthe prime fields GF(q), before starting this construction we need tointroduce a modification of the so called SMM algorithm.

4.2 Modified Spectral Modular Multiplication Algorithm

It turns out that the least magnitude residue representation of integersin the ring Z_(q) is more suitable for applying the Spectral ModularMultiplication method in elliptic curve cryptography. We represent theintegers in the ring Z_(q) with the set {−q/2, . . . , −1, 0, 1, . . . ,q/2}. In this convention, the modular reduction method picks values fromthe least magnitude set, e.g., 12 mod 7 is equal to −2 instead of 5. Wemake some small changes in the SMM method in order to utilize the leastmagnitude residues properly. The modified spectral multiplicationalgorithm is denoted with the following operationR(t)=SMM2[A(t),B(t)].The detailed steps of the SMM2 method are given below.

-   -   1: R(t)=A(t)⊙B(t)(mod q)    -   2: α=0    -   3: for i=0 to d−1    -   4: r₀=d⁻¹(R₀+R₁+ . . . +R_(d-1))(mod q)    -   5: β=r₀+α(mod b)    -   6: α=(r₀+α)/b    -   7: R(t)=R(t)−β·Θ(t)(mod q)    -   8: R(t)=R(t)−(r₀−β)(mod q)    -   9: R(t)=R(t)⊙Γ(t)(mod q)    -   10: end for    -   11: return R(t)

We will now describe the spectral point multiplication method forelliptic curves. We would like to remark that different representationsof points on elliptic curves brings various realizations of the ellipticcurve system which are suitable for different purposes. The two commonrepresentations are deduced by presenting the curve in the affine andprojective coordinates. The former gives a straightforwardrepresentation involving inversions in the finite field, while thelatter replaces inversions by multiplications. In general, this isdesired since the inversion operation in GF(q) is more time- andresource-intensive operation than the multiplication.

4.3 Spectral Projective Point Multiplication (SPPM)

We describe the spectral point multiplication method for projectivecoordinates. The SPPM method computes

=e×

given the integer e and the point

. The underlying field is a prime field GF(n) (i.e. n is a prime), andtherefore, we need to setup a mod n spectral arithmetic as was the casefor the modular exponentiation operation. The preprocessing step of theSPPM method is essentially the same the preprocessing step of the SMMmethod (Preprocessing with n): Given n, we need to compute v₀, θ, Θ(t),λ, δ, Δ(t), and K(t). After these computations, we start thePreprocessing with

phase, and move into the Exponentiation Loop and Postprocessing phases.

-   Preprocessing with    : Given    =(x,y,z), obtain    (t)=(x(t),y(t),z(t)).    -   1. Compute the point        ′(t)=(X(t),Y(t),Z(t))=(DFT[x(t)],DFT[y(t)],DFT[z(t)]).    -   2. Assign        ′(t)=        ′(t)=(0,K(t),0). Note that        =(0,1,0) is the projective coordinate representation of the        point at infinity and        ′(t)=(0,K(t),0) is its DFT.    -   3. Use the modified Spectral Modular Multiplication (SMM2)        method to compute        ′(t)=( X (t), Y (t), Y (t))        as follows        X (t)=SMM2[X(t),Δ(t)],        Y (t)=SMM2[Y(t),Δ(t)],        Z (t)=SMM2[Z(t),Δ(t)],    -   4. Use the SMM2 method to compute        ′(t)=(0, K (t),0)        -   such that            K (t)=SMM2[K(t),Δ(t)].-   Exponentiation Loop: The exponentiation operation is performed as    soon as the j-bit exponent e is available. Let the binary expansion    of e be (e_(j-1) e_(j-2) . . . e₁e₀)₂. The exponentiation operation    needs    ′(t) and    ′(t) in addition to the exponent e. The exponentiation method (the    point multiplication) method relies on elliptic curve point doubling    and point addition methods. Since we work in the spectral domain and    in the projective coordinate systems, we name these methods as the    Spectral Projective Point Doubling (SPPD) and the Spectral    Projective Point Additions (SPPA) methods.    -   for i=j−1 downto 0        ′(t)=SPPD[        ′(t)]    -   if e_(i)=1 then        ′(t)=SPPA[        ′(t),        ′(t)]-   Postprocessing: After the additive exponentiation loop is completed,    we will have a final value of    -   ′(t)=( X (t), Y (t), Z (t)).    -   This vector now needs to be brought back to the time domain.    -   1. Obtain        ′(t)=(X(t),Y(t),Z(t)) using the SMM2 method by multiplying K(t)        as        X(t)=SMM2[ X (t),K(t)],        Y(t)=SMM2[ Y (t),K(t)],        Z(t)=SMM2[ Z (t),K(t)],    -   2. Obtain        (t)=(x(t),y(t),z(t)) using the Inverse DFT function as follows        x(t)=IDFT[X(t)],        y(t)=IDFT[Y(t)],        z(t)=IDFT[Z(t)],-   Output: The point    (t)=(x(t),y(t),z(t)) is the output of the SPPM method, such that    (t)=e×    (t).    4.4 Spectral Projective Point Addition (SPPA)    Let    ₀(t)=(X₀(t),Y₀(t),Z₀(t)) and    ₁(t)=(X₁(t),Y₁(t),Z₁(t)) be spectral representation of two points on    an elliptic curve E. The SPPA algorithm computes the projective    point addition    ₂=    ₀+    ₁ in the spectral domain. We will denote the operation using    ₂(t)=(X ₂(t),Y ₂(t),Z ₂(t))=SPPA[    ₀(t),    ₁(t)].

The steps of the SPPA method are given below.Ψ₀(t)=SMM2[X ₀(t),SMM2[Z ₁(t),Z ₁(t)]],Ψ₁(t)=SMM2[SMM2[Y ₀(t),Z ₁(t)],SMM2[Z ₁(t),Z ₁(t)]],Ψ₂(t)=SMM2[X ₁(t),SMM2[Z ₀(t),Z ₀(t)]],Ψ₃(t)=SMM2[SMM2[Y ₁(t),Z ₀(t)],SMM2[Z ₀(t),Z ₀(t)]],Ψ₄(t)=Ψ₀(t)−Ψ₂(t),Ψ₅(t)=Ψ₀(t)+Ψ₂(t),Ψ₆(t)=Ψ₁(t)+Ψ₃(t),Ψ₇(t)=Ψ₁(t)+Ψ₃(t),Z ₂(t)=SMM2[Z ₀(t),SMM2[Z ₁(t),Ψ₄(t)]],X ₂(t)=SMM2[Ψ₆(t),Ψ₆(t)]−SMM2[Ψ₅(t),SMM2[Ψ₄(t),Ψ₄(t)]],Ψ₈(t)=SMM2[Ψ₅(t),SMM2[Ψ₄(t),Ψ₄(t)]]−2X ₂(t),2Y ₂(t)=SMM2[Ψ₈(t),Ψ₆(t)]−SMM2[SMM2[Ψ₇(t),Ψ₄(t)],SMM2[Ψ₄(t),Ψ₄(t)]].4.5 Spectral Projective Point Doubling (SPPD)The SPPD algorithm computes the projective doubling 2×

₁(t) operation. We will denote the operation using

₂(t)=(X ₂(t),Y ₂(t),Z ₂(t))=2×

₁(t)=SPPD[

₁(t)].The steps are given below.Ψ₀(t)=3·SMM2[X ₁(t),X ₁(t)]+α·SMM2[SMM2[Z ₁(t)],SMM2[Z ₁(t),Z ₁(t)]],Z ₂(t)=2·SMM2[Y ₁(t),Z ₁(t)],Ψ₁(t)=4·SMM2[X ₁(t),SMM2[Y ₁(t),Y ₁(t)]],X ₂(t)=SMM2[Ψ₀(t),Ψ₀(t)]−2Ψ₁(t),Ψ₂(t)=8·SMM2[SMM2[Y ₁(t),Y ₁(t)],SMM2[Y ₁(t),Y₁(t)]],Y ₂(t)=SMM2[Ψ₀(t),Ψ₀(t)−X ₂(t)]−Ψ₂(t).4.6 Spectral Affine Point Multiplication (SAPM)If the affine coordinates used to represent the curve, the addition anddoubling formulae get simpler but one needs to deal with the inversionsin the finite field. The flow the point multiplication algorithm is sameas the projective case with a fewer coordinates. The preprocessing stepof the SAPM method is exactly the same the preprocessing step of theSPPM method: Given n, we need to compute v₀, θ, Θ(t), λ, δ, Δ(t), andK(t). After these computations, we start the Preprocessing with

phase, and move into the Exponentiation Loop and Postprocessing phases.

-   Preprocessing with    =(x,y): Given    , obtain    (t)=(x(t),y(t)).    -   1. Compute the point        ′(t)=(X(t),Y(t))=(DFT[x(t)],DFT[y(t)]).    -   2. Assign        ′(t)=        ′(t)=(0,K(t)). Note that        =(0,1) is the projective coordinate representation of the point        at infinity and        ′(t)=(0,K(t)) is its DFT.    -   3. Use the SMM2 method to compute        (t)=( X(t), Y(t)), where        X (t)=SMM2[X(t),Δ(t)],        Y (t)=SMM2[Y(t),Δ(t)].    -   4. Use the SMM2 method to compute        ′(t)=( K (t),0)        -   such that            K (t)=SMM2[K(t),Δ(t)].-   Exponentiation Loop: The exponentiation operation is performed as    soon as the j-bit exponent e is available. Let the binary expansion    of e be (e_(j-1) e_(j-2) . . . e₁e₀)₂. The exponentiation operation    needs    ′(t) and    ′(t) as input in addition to the exponent e.    -   for i=j−1 downto 0        ′(t)=SAPD[        ′(t)]        -   if e_(i)=1 then            ′(t)=SAPA[            ′(t),            ′(t)]-   Postprocessing: After the additive exponentiation loop is completed,    we will compute the final value of    ′(t)=( X(t), Y(t)). This vector will now be brought back to the time    domain as follows.    -   1. Obtain        ′(t)=(X(t),Y(t)) using the SMM2 method by multiplying K(t) as        X(t)=SMM2[ X (t),K(t)],        Y(t)=SMM2[ Y (t),K(t)].    -   2. Obtain        (t)=(x(t),y(t)) using the Inverse DFT function as follows        x(t)=IDFT[X(t)],        y(t)=IDFT[Y(t)].-   Output: The point    (t)=(x(t),y(t)) is the output of the SPPM method, such that    (t)=e×    (t).    4.7 Spectral Affine Point Addition (SAPA)    Let    ₀(t)=(X₀(t),Y₀(t)) and    ₁(t)=(X₁(t),Y₁(t)) be spectral representation of two points on an    elliptic curve E. The SAPA algorithm computes the affine point    addition    ₂=    ₀+    ₁ in the spectral domain. We will denote the operation using    ₂(t)=(X ₂(t),Y ₂(t))=SAPA[    ₀(t),    ₁(t)].    The steps of the SAPA method are given below;    Ψ(t)=SMM2[Y ₁(t)−Y ₀(t),(X ₁(t)−X ₀(t))⁻¹],    X ₂(t)=SMM2[Ψ(t),Ψ(t)]−X ₁(t)−X ₀(t),    Y ₂(t)=SMM2[X ₀(t)−X ₂(t),Ψ(t)]−X ₂(t)−Y ₀(t).    4.8 Spectral Affine Point Doubling (SAPD)    The SAPD algorithm computes the affine point doubling 2×    ₁(t) operation. We will denote the operation using    ₂(t)=(X ₂(t),Y ₂(t))=2×    ₁(t)=SAPD[    ₁(t)].    The steps are given below.    Ψ(t)=SMM2[3X ₀(t)+a(t),(2Y ₁(t))⁻¹],    X ₂(t)=SMM2[Ψ(t),Ψ(t)]−2X ₁(t),    Y ₂(t)=SMM2[X ₀(t)−X₂(t),Ψ(t)]−X ₂(t)−Y ₀(t).    4.9 Parameter Selection for Elliptic Curve Cryptography    In Section 2, we described the parameter selection methodology for    the SME method enriched by some sample parameters giving the key    sizes around the most popular key sizes. In this section, we will    present some similar examples for the ECC. Practically, the SMM2 and    SMM algorithms give the same bounds enforced by the inequalities (1)    and (2). The improvements described in Section 2.7 are also    applicable to SMM2 method. In Table 15, we demonstrate the suitable    parameters for the SPM (valid for both PSPM and ASPM). We would like    to add that the main characteristic of the ECC is having shorter key    sizes ranges from 160 bits to 540 bits. Thus, these tables can be    seen as a continuation of the parameter selection tables of Section    2.

TABLE 15 Standard and improved parameter selections for ECC. Bits RingDFT Root Wordsize Words k q d w u s 176 2⁴³ − 1 43 2 8 22 185 2³⁷ − 1 74−2 5 37 190 (2³⁸ − 1)/3 76 −2 5 38 192 2⁴⁷ − 1 47 2 8 24 234 (2⁵¹ − 1)/751 2 9 26 270 2⁵³ − 1 53 2 10 27 384 2⁸⁴ + 1 64 4 12 32 580 (2⁵⁸ − 1)/3116 −2 10 58 171 (2³⁷ + 1)/3 37 4 9 19 186 2³¹ − 1 62 −2 6 31 190 2³⁷ −1 37 2 10 19 210 (2⁴¹ + 1)/3 41 4 10 21 224 2³² + 1 64 2 7 32 231 2⁴¹ −1 41 2 11 21 264 2⁴³ − 1 43 2 12 22 296 2³⁷ − 1 74 −2 8 37 405 (2⁵³ +1)/3 53 4 15 27 410 2⁴¹ − 1 82 −2 10 41 540 (2⁵⁹ + 1)/3 59 2 19 30 5642⁴⁷ − 1 94 −2 12 47

5 HARDWARE ARCHITECTURES FOR SPECTRAL MODULAR ARITHMETIC

Recall that the core part of the SME, SPPM, SAPM methods consists of theSMM and SMM2 algorithms. These two multiplication algorithms (SMM andSMM2) are same except the representation set of Z_(q). From a designpoint of view, this difference is quite insignificant, and can be dealtwith in the circuit level.

Additionally, SMM_(—)2^(k) and SMM_q^(k) methods described in Section 3are the core computation for an elliptic curve point addition operationover extension fields. However we did not described them in detailed.

In this section, we describe the hardware architectures for all abovecores including SMM, SMM_(—)2^(k) and SMM_q^(k) methods by going throughtheir steps, as described in previous sections. We start by giving a toplevel model which is common for all these methods in FIG. 7. In thisarchitecture, the outputs of the convolution step (Step 1) feed the RMUXes. For the initial case, each MUX R chooses the input from Step 1,and then the reduction loop starts. The loop runs d times: at every run,the outputs of the processing units are passed to the interpolation andalso fed back to the unit itself. The processing engine waits until someparameters are generated from the parameter generation logic. After theloop runs as specified with its index, the processing units from 0 tod−1 outputs the coefficient of the resultant spectral polynomial R(t).It is important to realize that in this architecture all processingunits work in parallel.

Now, it is time to demonstrate the low level architectures such that wedescribe how actually each box in FIG. 7 works. This is also the pointwhere we particularly specify the implementations for differentstructures i.e. prime, binary extension and extension fields withmid-size characteristics.

5.1 Spectral Multiplication

In Step 1 of algorithms SMM, SMM_(—)2^(k) and SMM_q^(k), the convolutionproperty is employed. Given the vectors A(t) and B(t), we compute R(t)such that R_(i)(γ)=A_(i)(γ)B_(i)(γ)(mod f(γ)) for SMM_(—)2^(k) andR_(i)=A_(i)B_(i) (mod q) for SMM and SMM_q^(k) where i=0, 1, . . . ,d−1. This computation is accomplished using the hardware architecturegiven in FIG. 8.

Recall that our targeted rings are different for SMM, SMM_(—)2^(k) andSMM_q^(k) hence the multipliers vary. We briefly describe thesemultipliers:

SMM_(—)2^(k) over polynomial rings with f(γ)=γ^(n)−1: The multiplier isa v×v binary polynomial multiplication with a f(γ)=γ^(n)−1 reduction.Note that this is very similar to one's complement arithmetic withoutany carry propagation.

SMM_(—)2^(k) over finite fields with f(γ)=AOP(u): Since the set (8) and(9) are equal with a different order, the conversion between them isestablished using a permutation. Let an elementaεGF(2^(k))≈GF((2^(u))^(s)) be expressed in type I optimal basis as

$a = {\sum\limits_{i = 0}^{u - 1}{a_{i}{\beta^{2^{i}}.}}}$One can express a in the shifted polynomial basis as

$\overset{\_}{a} = {\sum\limits_{i = 0}^{u - 1}{{\overset{\_}{a}}_{i}{\beta^{i + 1}.}}}$The following permutation gives the conversion between these two basis:ā ₍₂ _(i) _(-1)mod(m+1)) =a ₁ for i=0, 1, . . . , m−1.Therefore the multiplication of two elements with a type IONBrepresentation can be performed by a polynomial multiplication with anAOP reduction after obtaining a shifted polynomial representation byusing the above permutation. One has to apply an inverse permutationafter the multiplication to obtain the result in normal basis form.

If the operands are same (i.e. the operation is a squaring) themultiplier is replaced by a bitwise circular shift of the binary vector,no reduction is needed.

SMM_q^(k) over a mid-size characteristic ring: Recall that our targetedrings are the Mersenne or Fermat rings for which q is of the form2^(v)±1. Hence, these multiplications can be realized by employing v×vmodulo multipliers and the complexities of these multipliers are notdifficult from the usual integer multiplication.SMM over an integer ring: Like as in SMM_q^(k) case we use some Mersenneor Fermat rings. Hence, multiplications are realized by some modularmultipliers.5.2 Spectral ReductionFor simplicity, the reduction steps which correspond to the loop with iis divided into two parts:

-   -   Parameter Generation: Here, we compute the parameters r₀, α, and        β, and feed feed them to the main processing units.    -   Processing Engine: Basically we add a multiple of the modulus to        the partial sum and then divide it by the base.        SMM_(—)2^(k) over polynomial rings with f(γ)=γ^(n)−1: We start        with the parameter generation; Step 4 of SMM_(—)2^(k)        corresponds to a partial interpolation in which a d-input XOR is        performed in order to find the zeroth coefficient of the        polynomial. Note that d⁻¹=1 since we work characteristic 2. FIG.        9 shows the architecture for these computations.

Steps 5 and 6 of SMM_(—)2^(k) as seen in the FIG. 10 are called theParameter Generation Logic (PGL) which computes the parameters r₀+β andβ. The adders seen in FIG. 10 are the usual v-bit XORs, obviously noreductions needed with these XORs.

The Processing Engine is the most resource-consuming stage and itcorresponds to Steps 7, 8, and 9 of SMM_(—)2^(k). In FIG. 11, we givethe architecture of a single processing unit. The processing engineconsists of d such units.

Both adders in FIG. 11 are v-bit XORs. The shift operation at the bottomof the figure corresponds to Step 9 of the SMM_(—)2^(k) core. As we pickw=γ, the multiplications with the coefficients of Γ(t) correspond to theconstant d−1−i bit circular shifts for processing units R_(i) where i=1,2, . . . , d−1.

SMM_(—)2^(k) over finite fields with f(γ)=AOP(u): The high levelarchitecture for this case has almost the same structure as f(γ)=γ^(n)−1case. Recall that normal basis representation is perfect for squarings.If SMM_(—)2^(k) method is considered, we can say that the reductionsteps has to be performed by using a polynomial representation becauseof shift-add methodology. In other words it is appropriate to keep thedata in polynomial representation except the squaring operation.Therefore the multiplication should be implemented as seen in FIG. 8 butif a squaring operation is emerged a basis conversion should be followedby some circular shift. This can done by some control and conversionlogic at Step 1 as seen in FIG. 12.

The architecture for the rest of the reduction loop can be performed asdescribed in FIGS. 10 and 8. Note that if a reduction is needed it is af(t)=AOP(u) reduction.

SMM_q^(k) over a mid-size characteristic ring: Since the characteristicis chosen as a typical wordsize of todays architectures. The GF(q)arithmetic corresponds to usual modular arithmetic on the units. As q ispreferred as a Fermat or Mersenne number of the form the form 2^(v)±1,the multipliers of Step 1 can be realized by employing v×v modulomultipliers. The complexities of these multipliers are not difficultfrom the usual integer multiplication. This computation is accomplishedusing the hardware architecture given in FIG. 13.

Observe that d⁻¹ is a constant v-bit number therefore, this can beaccomplished using a multi-operand addition.

Whenever d and w are both a power of 2, multiplication by d⁻¹ can bereplaced with shifts. This can be seen as follows: Let w=2^(l), andthus, we have w^(d)=2^(ld)=1 mod q. We can write d⁻¹ as d⁻¹=2^(ld-log d)mod q, hence, multiplication by d⁻¹ modulo q can be accomplished with ald−log d bit circular shift. Therefore, for special Fermat or Mersennerings, it is possible drop the multiplication by d⁻¹ in FIG. 13.

The parameter generation and processing engine parts are relativelyeasier than the ones in SMM_(—)2^(k) because no carries and overflowsexist. In fact we only need a inverter in order to realize this step asseen in FIG. 14.

The Processing Engine corresponds to Steps 5 and 6 of SMM_qk method. InFIG. 15, we give the architecture of a single processing unit. Theprocessing engine consists of d such units. The adder in FIG. 15 ismodulo q adder. The shift operation is different for each processingunit, unit i shifts R_(i) circularly by d−1−i where i=1, 2, . . . , d−1.

SMM over an integer ring: observe that SMM architecture is very similarto SMM_q^(k). However, SMM and SMM_q^(k) algorithms are quite different.To be specific in SMM we follow a carry save approach because forinteger multiplication evaluation of time polynomials gives largeinteger and coefficients of these polynomials are related. But in caseof SMM_q^(k) coefficients of time polynomials are not related andevaluation do not have any meaning. Keeping these in mind, Step 4 (i.e.Parameter Generation) of SMM corresponds to a partial interpolation andits architecture has given by FIG. 8.

The PLG of SMM (i.e. Steps 5 and 6) computes the parameters r₀+β and βcan be seen in FIG. 16. The adders seen in FIG. 16 are the usual v-bitadders and they do not need modular reductions since α, (r₀+α),(r₀+α+β)<q.

In SMM, the Processing Engine corresponds to Steps 7, 8, and 9. In FIG.17, we give the architecture of a single processing unit. The wholeengine consists of d such units. Note that both adders in FIG. 17 aremodulo q adders. The shift operation at the bottom of the figurecorresponds to Step 9 of the SMM core. As we pick was a power of 2, themultiplications with the coefficients of Γ(t) correspond to the constantd−1−i bit shifts for processing units R_(i) where i=1, 2, . . . , d−1.

The disclosed methods and apparatus can be used in a variety ofcryptographic applications. In most applications, a “message” or otherplaintext is encrypted into a ciphertext based on one or more encryptionparameters, referred to as keys. The ciphertext can be decrypted intothe plaintext based on one or more decryption parameters (keys) whichmay or may not be the same as those used in encryption. In someexamples, one or more of the keys can be made publicly available, sothat message intended for a specific recipient can be encrypted based onthe public key or keys. However, the public key is inadequate fordecryption which must be performed using at least one private key knownonly to the message recipient. Such systems are referred to a public keycryptographic systems, and tend to reduce the problems associated withkey exchange.

One example of such a system is the so-called RSA cryptosystem that isbased on exponentiation modulo the product of 2 large primes. Eachmessage recipient has an encryption key that consists of a modulus n=pq,wherein p, q are large prime numbers, and an exponent e that isrelatively prime with respect to the product (p−1)(q−1). In the RSAencryption method, a message M (a plaintext) is represented as a seriesof integers, and an encrypted text C (a ciphertext) is produced asC=M^(e) (mod n). The plaintext message can be recovered with adecryption key d that is the inverse of e (mod(p−1)(q−1)). The messageis recovered as M=C^(d) (mod pq). In this system, the public keyconsists of the pair (n,e) and the private key consists of the pair is(n,d). In this system, both encryption and decryption are based onmodular exponentiation, and the disclosed methods and apparatus can beused to produce ciphertexts and recover plaintexts from ciphtertexts.Modular exponentiation has similar application in other systems.

In some systems, messages are encrypted using standard symmetric methodsin which a message sender and a message recipient share common keys. Insuch systems, key exchange can be based on public key methods asdescribed above.

6 APPENDIX

TABLE 16 Suitable Fermat rings and the w and d values. Ring PrimeFactors w d w d  2¹⁶ + 1 65537 4 16 2 32  2²⁰ + 1 17 · 61681 32 8 410016  2²⁴ + 1 97 · 257 · 673 8 16  {square root over (8)} 32  2³² + 1 641· 6700417 4 32 2 64  2⁴⁶ + 1 257 · 4278255361 32 16  {square root over(32)} 256  2⁶⁴ + 1 274177 · 67280421310721 4 64 2 128  2⁸⁰ + 1 414721 ·44479210368001 32 32  {square root over (32)} 1024  2⁹⁶ + 1 641 ·6700417 · 8 64  {square root over (8)} 128 18446744069414584321 2¹¹² + 1449 · 2689 · 65537 · 183076097 · 2⁷ 32 {square root over (128)} 6435842984846099327 2¹²⁸ + 1 59649589127497217 · 4 128 2 2565704689200685129054721

TABLE 17 Suitable Mersenne rings and the w and d values. Ring PrimeFactors w d w d 2¹⁷ − 1 131071 2  17 −2  34 2¹⁹ − 1 524287 2  19 −2  382²³ − 1 47 · 178481 2  23 −2  46 2²⁹ − 1 233 · 1103 · 2089 2  29 −2  582³¹ − 1 2147483647 2  31 −2  62 2³⁷ − 1 223 · 616318177 2  37 −2  74 2⁴¹− 1 13367 · 164511353 2  41 −2  82 2⁴³ − 1 431 · 9719 · 2099863 2  43 −2 86 2⁴⁷ − 1 2351 · 4513 · 13264529 2  47 −2  94 2⁸³ − 1 6361 · 69431 ·20394401 2  53 −2 106 2⁸⁹ − 1 179951 · 3203431780337 2  59 −2 118 2⁶¹ −1 2305843009213693951 2  61 −2 122 2⁶⁷ − 1 193707721 · 761838257287 2 67 −2 134 2⁷¹ − 1 228479 · 48544121 · 212885833 2  71 −2 142 2⁷³ − 1439 · 2298041 · 9361973132609 2  73 −2 146 2⁷⁹ − 1 2687 · 202029703 ·1113491139767 2  79 −2 158 2⁸³ − 1 167 · 57912614113275649087721 2  83−2 166 2⁸⁹ − 1 618970019642690137449562111 2  89 −2 178 2⁹⁷ − 1 11447 ·13842607235828485645766393 2  97 −2 196 2¹⁰¹ − 7432339208719 ·341117531003194129 2 101 −2 202 1 2¹⁰³ − 2550183799 ·3976656429941438590393 2 103 −2 206 1 2¹⁰⁷ −162259276829213363391578010288127 2 107 −2 214 1 2¹⁰⁹ − 745988807 ?870035986098720987332873 2 109 −2 218 1 2¹¹³ − 3391 · 23279 · 65993 ·1868569 · 2 113 −2 226 1 1066818132868207 2¹²⁷ −170141183460469231731687303715884105727 2 127 −2 254 1

TABLE 18A Suitable pseudo Fermat rings and the w and d values. RingPrime Factors Modulus w d w d 2¹⁷ + 1 3 · 43691 (2¹⁷ + 1)/3  −2, 4 17 234 2¹⁹ + 1 3 · 174763 (2¹⁹ + 1)/3  −2, 4 19 2 38 2²⁰ + 1 17 · 61681(2²⁰ + 1)/17  4 20 2 40 2²¹ + 1 3² · 43 · 5419 (2²¹ + 1)/9  −2, 4 21 242 2²² + 1 5 · 397 · 2113 (2²² + 1)/5  4 22 2 44 2²³ + 1 3 · 2796203(2²³ + 1)/3  −2, 4 23 2 46 2²⁸ + 1 17 · 15790321 (2²⁸ + 1)/17  4 28 2 562²⁹ + 1 3 · 59 · 3033169 (2²⁹ + 1)/3  −2, 4 29 2 58 2³¹ + 1 3 ·715827883 (2³¹ + 1)/3  −2, 4 31 2 62 2³⁴ + 1 5 · 137 · 953 · 26317(2³⁴ + 1)/5  4 34 2 68 2³⁷ + 1 3 · 25781083 · 1777 (2³⁷ + 1)/3  −2, 4 372 74 2³⁸ + 1 5 · 229 · 457 · 525313 (2³⁸ + 1)/5  4 38 2 76 2³⁹ + 1 3² ·22366891 · 2731 (2³⁹ + 1)/9  −2, 4 39 2 78 2⁴⁰ + 1 257 · 4278255361(2⁴⁰ + 1)/257 4 40 2 80 2⁴¹ + 1 3 · 83 · 8831418697 (2⁴¹ + 1)/3  −2, 441 2 82 2⁴³ + 1 3 · 2932031007403 (2⁴³ + 1)/3  −2, 4 43 2 86 2⁴⁴ + 1 17· 353 · 2931542417 (2⁴⁴ + 1)/17  4 44 2 88 2⁴⁶ + 1 5 · 277 · 1013 · 1657· 30269 (2⁴⁶ + 1)/5  4 46 2 92 2⁴⁷ + 1 3 · 283 · 165768537521 (2⁴⁷ +1)/3  −2, 4 47 2 94 2⁵² + 1 17 · 308761441 · 858001 (2⁵² + 1)/17  4 52 2102 2⁵³ + 1 3 · 107 · 28059810762433 (2⁵³ + 1)/3  −2, 4 53 2 106 2⁵⁶ + 1257 · 54410972897 · 5153 (2⁵⁸ + 1)/257 4 56 2 112 2⁵⁷ + 1 3² · 571 ·160465489 · 174763 (2⁵⁷ + 1)/9  −2, 4 57 2 114 2⁵⁸ + 1 5 · 107367629 ·536903681 (2⁵⁸ + 1)/5  4 58 2 116 2⁵⁹ + 1 3 · 1824726041 · 37171 · 2833(2⁵⁹ + 1)/3  −2, 4 59 2 118 2⁶⁰ + 1 17 · 241 · 4562284561 · 61681 (2⁶⁰ +1)/17  4 60 2 120 2⁶¹ + 1 3 · 768614336404564651 (2⁶¹ + 1)/3  −2, 4 61 2122 2⁶² + 1 5 · 384773 · 49477 · (2⁶² + 1)/5  4 62 2 124 8681 · 55812⁶⁵ + 1 3 · 11 · 131 · (2⁶⁵ + 1)/3  −2, 4 65 2 130 409891 · 7623851 ·2731

TABLE 18B Suitable pseudo Fermat rings and the w and d values. RingPrime Factors Modulus w d w d  2⁶⁶ + 1 5 · 13 · 397 · 4327489 · 312709 ·2113 (2⁶⁶ + 1)/5  4 66 2 132  2⁶⁷ + 1 3 · 6713103182899 · 7327657 (2⁶⁷ +1)/3  −2, 4 67 2 134  2⁶⁸ + 1 17² · 2879347902817 · 354689 (2⁶⁸ + 1)/1724 68 2 136  2⁷¹ + 1 3 · 56409643 · 13952598148481 (2⁷¹ + 1)/3  −2, 4 712 142  2⁷³ + 1 3 · 1795918038741070627 · 1753 (2⁷³ + 1)/3  −2, 4 73 2146  2⁷⁴ + 1 5 · 149 · 593 · 184481113 · 231769777 (2⁷⁴ + 1)/5  4 74 2148  2⁷⁶ + 1 17 · 1217 · 24517014940753 · 148961 (2⁷⁶ + 1)/17  4 76 2152  2⁷⁹ + 1 3 · 201487636602438195784363 (2⁷⁹ + 1)/3  −2, 4 79 2 158 2⁸² + 1 5 · 181549 · 12112549 · 43249589 · 10169 (2⁸² + 1)/5  4 82 2164  2⁸³ + 1 3 · 499 · 1163 · 13455809771 · 155377 · 2657 (2⁸³ + 1)/3 −2, 4 83 2 166  2⁸⁵ + 1 3 · 11 · 26831423036065352611 · 43691 (2⁸⁵ +1)/33  −2, 4 85 2 170  2⁸⁶ + 1 5 · 173 · 1759217765581 · 500177 · 101653(2⁸⁶ + 1)/5  4 86 2 172  2⁸⁷ + 1 3² · 59 · 96076791871613611 · 3033169(2⁸⁷ + 1)/531 −2, 4 87 2 174  2⁸⁸ + 1 257 · 43872038849 · 119782433 ·229153 (2⁸⁸ + 1)/257 4 88 2 176  2⁸⁹ + 1 3 · 179 · 18584774046020617 ·62020897 (2⁸⁹ + 1)/3  −2, 4 89 2 178  2⁹¹ + 1 3 · 43 · 25829691707 ·1210483 · 2731 · 224771 (2⁹¹ + 1)/129 −2, 4 91 2 182  2⁹² + 1 17 ·291280009243618888211558641 (2⁹² + 1)/17  4 92 2 184  2⁹³ + 1 3² ·529510939 · 2903110321 · 715827883 (2⁹³ + 1)/9  −2, 4 93 2 186  2⁹⁴ + 15 · 7484047069 · 1407374715788113 · 31817 (2⁹⁴ + 1)/5  4 94 2 188  2⁹⁶ +1 641 · 18446744069414584321 · 6700417 (2⁹⁶ + 1)/641 4 96 2 192  2⁹⁷ + 13 · 971 · 1553 · 1100876018364883721 · 31817 (2⁹⁷ + 1)/3  −2, 4 97 2 1942¹⁰¹ + 1 3 · 845100400152152934331135470251 (2¹⁰¹ + 1)/3  −2, 4 101 2202 2¹⁰³ + 1 3 · 8142767081771726171 · 415141630193 (2¹⁰³ + 1)/3  −2, 4103 2 206 2¹⁰⁴ + 1 257 · 78919881726271091143763623681 (2¹⁰⁴ + 1)/257 4104 2 208 2¹⁰⁶ + 1 5 · 15358129 · 586477649 · 1801439824104653 (2¹⁰⁶ +1)/5  4 106 2 212 2¹⁰⁷ + 1 3 · 84115747449047881488635567801 (2¹⁰⁷ +1)/3  −2, 4 107 2 214 2¹⁰⁹ + 1 3 · 2077756847362348863128179 · 104124649(2¹⁰⁹ + 1)/3  −2, 4 109 2 218

TABLE 18C Suitable pseudo Fermat rings and the w and d values. RingPrime Factors Modulus w d w d 2¹¹¹ + 1 3² · 1777 · 3331 · 17539 ·(2¹¹¹ + 1)/9   −2, 4 111 2 222 25781083 · 107775231312019 2¹¹³ + 1 3 ·227 · 48817 · (2¹¹³ + 1)/3   −2, 4 113 2 226 636190001 ·491003369344660409 2¹¹⁴ + 1 5 · 13 · 229 · 457 · 131101 · (2¹¹⁴ + 1)/65 −2, 4 114 2 228 160969 · 525313 · 275415303169 2¹¹⁶ + 1 17 · 59393 ·(2¹¹⁶ + 1)/17  4 116 2 232 82280195167144119832390568177 2¹¹⁸ + 1 5 ·1181 · 3541 · 157649 · (2¹¹⁸ + 1)/5   −2, 4 118 2 236 174877 · 5521693 ·104399276341 2¹²⁰ + 1 97 · 257 · 673 · 394783681 · (2¹²⁰ + 1)/257  32 48{square root over (32)} 96 46908728641 · 4278255361 2¹²¹ + 1 3 · 683 ·117371 · (2¹²¹ + 1)/4098 −2, 4 121 2 242 110541845827978004557360611072¹²² + 1 5 · 733 · 1709 · 3456749 · (2¹²² + 1)/5   4 122 2 2448831418697 · 13194317913029593 2¹²³ + 1 3² · 83 · 739 · 165313 · (2¹²³ +1)/747  −2, 4 123 2 146 8831418697 · 13194317913029593 2¹²⁴ + 1 17 ·290657 · 3770202641 · (2¹²⁴ + 1)/17  4 124 2 248 11416291804019768958732¹²⁷ + 1 3 · 56713727820156410577229101238628035243 (2¹²⁷ + 1)/3   −2, 4127 2 254

TABLE 19A Suitable pseudo Mersenne rings and the w and d values. RingPrime Factors Modulus w d w d  2²⁵ − 1 31 · 601 · 1801 (2²⁵ − 1)/31  225 −2 50  2²⁶ − 1 3 · 2731 · 8191 (2²⁶ − 1)/3   2 26 −2 52  2²⁷ − 1 7 ·73 · 262657 (2²⁷ − 1)/511  2 27 −2 54  2³⁴ − 1 3 · 43691 · 131071 (2³⁴ −1)/3   2 34 −2 68  2³⁵ − 1 31 · 71 · 127 · 122921 (2³⁵ − 1)/3937 2 35 −270  2³⁸ − 1 3 · 174763 · 524287 (2³⁸ − 1)/3   2 38 −2 76  2³⁹ − 1 7 · 79· 8191 · 121369 (2³⁹ − 1)/7   2 39 −2 78  2⁴⁶ − 1 3 · 47 · 178481 ·2796203 (2⁴⁶ − 1)/3   2 46 4 23  2⁴⁹ − 1 127 · 4432676798593 (2⁴⁹ −1)/127  2 49 −2 98  2⁵¹ − 1 7 · 103 · 2143 · 11119 · 131071 (2⁵¹ −1)/7   2 51 −2 102  2⁵⁷ − 1 7 · 32377 · 524287 · 1212847 (2⁵⁷ − 1)/7   257 −2 114  2⁵⁸ − 1 3 · 59 · 233 · 1103 · 2089 · 3033169 (2⁵⁸ − 1)/3   258 4 29  2⁶² − 1 3 · 715827883 · 2147483647 (2⁶² − 1)/3   2 62 −2 124 2⁶⁴ − 1 3 · 5 · 17 · 257 · 641 · 65537 · 6700417 (2⁶⁴ − 1)/255  2 64 −2128  2⁶⁵ − 1 31 · 8191 · 145295143558111 (2⁶⁵ − 1)/31  2 65 −2 130  2⁷⁴− 1 3 · 223 · 1777 · 25781083 · 616318177 (2⁷⁴ − 1)/3   2 74 −2 148  2⁷⁵− 1 7 · 31 · 151 · 601 · 1801 · 100801 · 10567201 (2⁷⁵ − 1)/217  2 75 −2150  2⁷⁸ − 1 3² · 7 · 79 · 2731 · 8191 · 121369 · 22366891 (2⁷⁸ − 1)/63 2 78 4 39  2⁸² − 1 3 · 83 · 13367 · 164511353 · 8831418697 (2⁸² − 1)/3  2 82 4 41  2⁸⁵ − 1 31 · 131071 · 9520972806333758431 (2⁸⁵ − 1)/31  2 85−2 170  2⁸⁶ − 1 3 · 431 · 9719 · 2099863 · 2932031007403 (2⁸⁶ − 1)/3   286 4 43  2⁹¹ − 1 127 · 911 · 8191 · 112901153 · 23140471537 (2⁹¹ −1)/127  2 91 −2 182  2⁹³ − 1 7 · 2147483647 · 658812288653553079 (2⁹³ −1)/7   2 93 −2 186  2⁹⁴ − 1 3 · 283 · 2351 · 4513 · (2⁹⁴ − 1)/3   2 94 447 13264529 · 165768537521 2¹⁰⁶ − 1 3 · 107 · 6361 · 69431 · 20394401 ·(2¹⁰⁶ − 1)/3   2 106 4 53 28059810762433

TABLE 19B Suitable pseudo Mersenne rings and the w and d values. RingPrime Factors Modulus w d w d 2¹¹¹ − 1 7 · 223 · 321679 · 26295457 ·616318177 · (2¹¹¹ − 1)/7   2 111 −2 222 319020217 2¹¹⁴ − 1 3² · 7 · 571· 32377 · 174763 · (2¹¹⁴ − 1)/63  2 114 4 57 524287 · 1212847 ·160465489 2¹¹⁵ − 1 31 · 47 · 14951 · 178481 · 4036961 · (2¹¹⁵ − 1)/14572 115 −2 230 2646507710984041 2¹¹⁸ − 1 3 · 2833 · 37171 · 179951 · (2¹¹⁸− 1)/3   2 118 4 59 1824726041 · 3203431780337 2¹²¹ − 1 23 · 89 · 727 ·(2¹²¹ − 1)/2047 2 121 −2 242 1786393878363164227858270210279 2¹²² − 1 3· 768614336404564651 · (2¹²² − 1)/3   2 122 −2 244 23058430092136939512¹²⁸ − 1 3 · 5 · 17 · 257 · 641 · 65537 · (2¹²⁸ − 1)/255  2 128 4 64274177 · 6700417 · 67280421310721

TABLE 20 Parameter selection by using CRT for SME with SMM. Bits RingDFT Root Words Wordsize k

q d w s u 518 q1 2³⁷ − 1 74 −2 37 14 q2 (2³⁷ + 1)/3 74 2 37 1,071 q1(2⁵¹ − 1)/7 102 −2 51 21 q2 2⁵³ − 1 106 −2 53 2,130 q1 (2⁷¹ − 1) 141 −271 30 q2 (2⁷¹ + 1)/3 141 2 71 4,171 q1 (2⁹⁷ − 1) 194 −2 97 43 q2 (2⁹⁷ +1)/3 194 2 97

TABLE 21A Improved parameter selection by using CRT for SME. Bits RingDFT Root Words Wordsize k

q d w s u 512 q1 2¹⁶ + 1 32 2 16 32 q2 2¹⁷ − 1 34 −2 17 q3 (2¹⁷ + 1)/334 2 17 q4 2¹⁹ − 1 38 −2 19 q5 (2²⁰ + 1) 40 2 20 551 q1 2²⁹ − 1 58 −2 2919 q2 2³¹ − 1 62 −2 31 560 q1 2³¹ − 1 32 2 16 35 q2 (2³¹ + 1)/3 32 −2 16q3 2³² + 1 32 4 16 1,054 q1 2³¹ − 1 62 −2 31 34 q2 (2³¹ + 1)/3 62 2 31q3 2³² + 1 64 2 32 1,148 q1 2⁴¹ − 1 82 −2 41 28 q2 (2⁴¹ + 1)/3 82 2 412,015 q1 2³¹ − 1 62 −2 31 65 q2 (2³¹ + 1)/3 62 2 31 q3 2³² + 1 64 2 32q4 2⁶⁴ + 1 64 2 32 2,052 q1 2⁷¹ − 1 71 2 36 57 q2 (2⁷¹ + 1)/3 71 −2 362,067 q1 2⁵³ − 1 106 −2 53 39 q2 (2⁵³ + 1)/3 106 2 53

TABLE 21B Improved parameter selection by using CRT for SME. Bits RingDFT Root Words Wordsize k

q d w s u q1 (2⁵⁸ + 1)/5 116 2 58 71 4,118 q2 2⁵⁹ − 1 118 −2 59 q3(2⁵⁹ + 1)/3 118 2 59 4,161 q1 2⁷³ − 1 146 −2 73 57 q2 (2⁷³ + 1)/3 146 273 8,216 q1 2⁷⁹ − 1 158 −2 79 104 q2 (2⁷⁹ + 1)/5 158 2 79 q3 2⁸³ − 1 166−2 83 8,357 q1 2⁶¹ − 1 122 −2 61 137 q2 (2⁶¹ + 1)/3 122 2 61 q3 (2⁶² +1)/5 124 2 62 q4 2⁶² + 4 128 2 64 q5 (2⁶⁴ − 1)/31 130 −2 65 8,484 q12¹⁰¹ − 1 202 −2 101 84 q2 (2¹⁰¹ +1)/3 202 2 101 16,428 q1 (2¹¹¹ − 1)/7222 −2 111 148 q2 (2¹¹¹ + 1)/9 222 2 111 q3 2¹¹³ − 1 226 −2 113 16,827q1 2⁷¹ − 1 142 −2 71 237 q2 2⁷³ − 1 146 −2 73 q3 (2⁷⁴ + 1)/5 148 4 74 q4(2⁷⁵ − 1)/217 150 −2 75 q5 (2⁷⁶ + 1)/17 152 4 76 q6 2⁷⁹ − 1 158 −2 79 q7(2⁷⁹ + 1)/3 158 2 79

REFERENCES

-   [1] R. E. Blahut. Fast Algorithms for Digital Signal Processing,    Chapter 6, Addison-Wesley Publishing Company, 1985.-   [2] A. Halbutogullari and    K. Koç. Mastrovito multiplier for general irreducible polynomials.    IEEE Transactions on Computers, 49(5):503-518, May 2000.-   [3]    K. Koç. High-Speed RSA Implementation. TR 201, RSA Laboratories, 73    pages, November 1994.-   [4]    K. Koç. RSA Hardware Implementation. TR 801, RSA Laboratories, 30    pages, April 1996.-   [5] H. J. Naussbaumer. Fast Fourier Transform and Convolution    Algorithms, Chapter 8, Springer, Berlin, Germany, 1982.-   [6] J. M. Pollard. The fast Fourier transform in a finite field.    Mathematics of Computation, vol. 25, pp. 365-374, 1971.-   [7] A. V. Aho, J. E. Hoperoft and J. D. Ullman. The Design and    Analysis of Computer Algorithms, Chapter 7, Addison-Wesley    Publishing Company, 1974.-   [8] A. Schönhage and V. Strassen, “Schnelle multiplikation grafer    zahlen,” Computing, vol. 7, pp. 281-292, 1971.

As shown above, the disclosed methods and apparatus have numerousapplications in secure communications and message authentication. Avariety of numerical parameters are used include numericalrepresentations of messages (both plaintext and ciphertext), keys,signatures, modulus, and other parameters. For convenience, theseparameters can be referred to as cryptographic parameters. SMM asdescribed herein can be used with structures for which modularmultiplication is necessary, such as, for example, GF(q^(k)), q prime,or finite rings in general. SMM can be used in any operation possiblewith such structures such as, for example, exponentiation over Z_(n),elliptic point addition over GF(q^(k)), hyperelliptic curve addition,pairings or other operation over a polynomial ring Z[t]/(f(t)), oroperations with p-adic numbers. In some examples, modular arithmeticoperations are described with reference to operations on integerrepresentations of cryptographic parameters. This is for convenienceonly, and modular operations are similarly configured for operations inwhich representations of cryptographic parameters are based onpolynomial or other representations.

In some examples, modular products of operands x and y are obtainedbased on a product of xy. In other examples, such modular products arebased on a product xyr^(−d), wherein d is an integer. Such modularproducts are typically used in so-called Montgomery products orMontgomery reductions. Modular reductions of either are referred toherein as modular produces of x and y.

The methods disclosed herein can be implemented in software as computerexecutable instructions, or can be implemented in dedicated or adaptedhardware such as application specific integrated circuits (ASICs) orother hardware cryptographic processor. Computer executable instructionscan be stored in RAM, ROM, on hard disk, floppy disk, CD, DVD or otherstorage medium as convenient.

In view of the many possible embodiments to which the principles of thedisclosed technology may be applied, it should be recognized that theillustrated embodiments are only preferred examples of the and shouldnot be taken as limiting the scope of the technology. Rather, the scopeof the invention is defined by the following claims. We therefore claimas our invention all that comes within the scope and spirit of theseclaims.

We claim:
 1. A method of modular exponentiation, compromising: at acryptographic processor, receiving a message text m, an exponent e, anda modulus n, represented as a number of words s; determining a v₀multiple of the modulus n, representing the v₀ multiple of the modulus nas a polynomial θ, and obtaining a Fourier transform of the polynomialθ, wherein v₀ is an inverse of a least significant word of n modulusb=2^(u), and u is a word length; obtaining a Fourier transform Δ of apolynomial representation δ associated with a square of a Montgomerycoefficient modulo n; obtaining a polynomial representation M of aFourier transform of the message text m; performing exponentiation of Min the Fourier transform domain by a series of j spectral modularmultiplications to obtain a transform domain representation, wherein jis a number of bits in the exponent e; and obtaining an inverse Fouriertransform of the transform domain representation to obtain an integerassociated with m^(e) (mod n).
 2. A modular multiplication method,comprising: at a cryptographic processor, receiving first and secondcryptographic parameters a and b, and a ring parameter q; determiningFourier transforms A, B of the first and second parameters a, b,respectively, based on polynomial representations of the first andsecond parameters; determining a component wise modular product R of theFourier transforms of A, B with respect to a modulus q; determining a v₀multiple of a modulus n, representing the v₀ multiple of the modulus nas a polynomial θ, and obtaining a Fourier transform of the polynomialθ, wherein v₀ is an inverse of a least significant word of n modulusb=2^(u), and u is a word length; iteratively processing the modularproduct R based on a Fourier transform of the polynomial θ andcomponent-wise modular products with a vector based on a principal rootof unity with respect to the modulus q; and outputting the modularproduct R of the last iteration.
 3. A non-transitory computer readablemedium, comprising computer-executable instructions for a method ofmodular exponentiation, comprising: at a cryptographic processor,receiving a message text m, an exponent e, and a modulus n, representedas a number of words s; determining a v₀ multiple of the modulus n,representing the v₀ multiple of the modulus n as a polynomial θ, andobtaining a Fourier transform of the polynomial θ, wherein v₀ is aninverse of a least significant word of n modulus b=2^(u), and u is aword length; obtaining a Fourier transform Δ of a polynomialrepresentation δ associated with a square of a Montgomery coefficientmodulo n; obtaining a polynomial representation M of a Fourier transformof the message text m; performing exponentiation of M in the Fouriertransform domain by a series of j spectral modular multiplications toobtain a transform domain representation, wherein j is a number of bitsin the exponent e; and obtaining an inverse Fourier transform of thetransform domain representation to obtain an integer associated withm^(e) (mod n).